Cautious Policy Programming: Exploiting KL Regularization in Monotonic Policy Improvement for Reinforcement Learning.
Lingwei ZhuToshinori KitamuraTakamitsu MatsubaraPublished in: CoRR (2021)
Keyphrases
- optimal policy
- reinforcement learning
- partially observable environments
- action selection
- markov decision processes
- policy search
- control policies
- partially observable
- markov decision process
- control policy
- learning algorithm
- reinforcement learning problems
- action space
- partially observable domains
- machine learning
- policy evaluation
- policy iteration
- asymptotically optimal
- reinforcement learning algorithms
- reward function
- maximum likelihood
- significant improvement