Entropy-Augmented Entropy-Regularized Reinforcement Learning and a Continuous Path from Policy Gradient to Q-Learning.
Donghoon LeePublished in: CoRR (2020)
Keyphrases
- reinforcement learning
- policy gradient
- reinforcement learning algorithms
- function approximation
- actor critic
- state space
- policy search
- model free
- learning algorithm
- multi agent
- continuous state and action spaces
- temporal difference
- reinforcement learning methods
- optimal policy
- machine learning
- state action
- action space
- temporal difference learning
- rl algorithms
- markov decision processes
- dynamic programming
- optimal control
- single agent
- evaluation function
- approximate dynamic programming
- cooperative
- model free reinforcement learning