A Nonparametric Offpolicy Policy Gradient.
Samuele TosattoJoão CarvalhoHany AbdulsamadJan PetersPublished in: CoRR (2020)
Keyphrases
- policy gradient
- actor critic
- function approximation
- reinforcement learning
- parametric optimization
- gradient method
- optimal control
- model free reinforcement learning
- reinforcement learning algorithms
- approximation methods
- reinforcement learning methods
- partially observable markov decision processes
- average reward
- single agent
- state action
- variance reduction
- temporal difference
- convergence speed
- neural network