Cautious policy programming: exploiting KL regularization for monotonic policy improvement in reinforcement learning.
Lingwei ZhuTakamitsu MatsubaraPublished in: Mach. Learn. (2023)
Keyphrases
- optimal policy
- reinforcement learning
- policy search
- action selection
- markov decision process
- state space
- partially observable environments
- markov decision processes
- reinforcement learning problems
- machine learning
- learning algorithm
- neural network
- state and action spaces
- control policy
- partially observable
- infinite horizon
- long run
- least squares
- reinforcement learning algorithms
- programming language
- policy iteration
- action space
- reward function
- policy evaluation
- dynamic programming