A Nonparametric Off-Policy Policy Gradient.
Samuele TosattoJoão CarvalhoHany AbdulsamadJan PetersPublished in: AISTATS (2020)
Keyphrases
- policy gradient
- parametric optimization
- actor critic
- function approximation
- reinforcement learning
- optimal control
- model free reinforcement learning
- gradient method
- reinforcement learning algorithms
- approximation methods
- reinforcement learning methods
- variance reduction
- average reward
- function approximators
- single agent
- temporal difference
- mobile robot