Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning.
Shixiang GuTim LillicrapRichard E. TurnerZoubin GhahramaniBernhard SchölkopfSergey LevinePublished in: NIPS (2017)
Keyphrases
- policy gradient
- gradient estimation
- variance reduction
- actor critic
- reinforcement learning
- policy search
- policy gradient methods
- function approximation
- monte carlo
- model free reinforcement learning
- sample size
- reinforcement learning algorithms
- gradient method
- optimal control
- importance sampling
- temporal difference
- reinforcement learning methods
- reward function
- state space
- approximation methods
- partially observable markov decision processes
- control problems
- confidence intervals
- neuro fuzzy
- natural actor critic
- markov decision processes