Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient.
Samuele TosattoJoão CarvalhoJan PetersPublished in: CoRR (2020)
Keyphrases
- policy gradient
- reinforcement learning
- actor critic
- reinforcement learning algorithms
- function approximation
- policy search
- gradient method
- optimal control
- policy gradient methods
- reinforcement learning methods
- average reward
- multi agent
- state space
- model free
- model free reinforcement learning
- partially observable markov decision processes
- temporal difference
- approximation methods
- convergence rate
- markov decision processes
- machine learning
- neural network
- single agent
- state action
- variance reduction
- least squares
- dynamic programming
- learning algorithm