An Alternate Policy Gradient Estimator for Softmax Policies.
Shivam GargSamuele TosattoYangchen PanMartha WhiteA. Rupam MahmoodPublished in: CoRR (2021)
Keyphrases
- policy gradient
- policy gradient methods
- policy search
- variance reduction
- partially observable markov decision processes
- actor critic
- reinforcement learning
- natural actor critic
- function approximation
- optimal policy
- gradient method
- least squares
- optimal control
- reinforcement learning algorithms
- model free reinforcement learning
- temporal difference learning
- average reward
- maximum likelihood
- finite state
- monte carlo
- multi agent
- importance sampling
- markov decision processes
- learning algorithm
- approximation methods
- neural network
- recursive least squares
- linear regression
- convergence rate
- particle filter
- dynamic programming