Policy Gradient using Weak Derivatives for Reinforcement Learning.
Sujay BhattAlec KoppelVikram KrishnamurthyPublished in: CISS (2019)
Keyphrases
- policy gradient
- reinforcement learning
- actor critic
- function approximation
- reinforcement learning algorithms
- policy search
- optimal control
- gradient method
- policy gradient methods
- model free
- reinforcement learning methods
- model free reinforcement learning
- state space
- average reward
- single agent
- state action
- partially observable markov decision processes
- approximation methods
- function approximators
- machine learning
- policy iteration
- markov decision process
- variance reduction
- temporal difference
- markov decision processes
- optimal policy
- dynamic programming
- multi agent
- learning algorithm