Safeguarded Progress in Reinforcement Learning: Safe Bayesian Exploration for Control Policy Synthesis.
Rohan MittaHosein HasanbeigJun WangDaniel KroeningYiannis KantarosAlessandro AbatePublished in: CoRR (2023)
Keyphrases
- control policy
- reinforcement learning
- exploration exploitation
- exploration strategy
- approximate dynamic programming
- control policies
- admission control
- action selection
- long run
- function approximation
- bayesian networks
- state space
- model based reinforcement learning
- model free
- batch mode
- design space exploration
- active exploration
- learning algorithm
- temporal difference
- exploration exploitation tradeoff
- balancing exploration and exploitation
- markov decision processes
- reinforcement learning algorithms
- bayesian inference
- average cost
- optimal policy
- supervised learning
- single agent
- posterior probability
- transfer learning
- autonomous learning
- markov chain
- np hard
- multi agent
- machine learning
- posterior distribution