Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms.
Romain LarocheRemi Tachet des CombesPublished in: AISTATS (2022)
Keyphrases
- policy gradient
- actor critic
- policy gradient methods
- reinforcement learning
- policy iteration
- partially observable markov decision processes
- approximation methods
- natural actor critic
- approximate dynamic programming
- gradient method
- neuro fuzzy
- function approximation
- optimal control
- reinforcement learning algorithms
- model free
- learning algorithm
- temporal difference
- average reward
- finite state
- optimization methods
- reinforcement learning methods
- computational complexity
- optimal policy
- function approximators
- radial basis function
- variance reduction
- rl algorithms
- fixed point
- multi agent
- markov decision processes