The Regret of Exploration and the Control of Bad Episodes in Reinforcement Learning.
Victor BooneBruno GaujalPublished in: ICML (2023)
Keyphrases
- real robot
- reinforcement learning
- control problems
- action selection
- exploration exploitation
- exploration strategy
- optimal control
- adaptive control
- control strategies
- total reward
- bandit problems
- model free
- control system
- robot control
- function approximation
- lower bound
- machine learning
- cost sensitive
- optimal policy
- active learning
- multi agent systems
- active exploration
- robotic control
- model based reinforcement learning