Login / Signup
Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning.
Shixiang Gu
Tim Lillicrap
Richard E. Turner
Zoubin Ghahramani
Bernhard Schölkopf
Sergey Levine
Published in:
NIPS (2017)
Keyphrases
</>
policy gradient
gradient estimation
variance reduction
actor critic
reinforcement learning
policy search
policy gradient methods
function approximation
monte carlo
model free reinforcement learning
sample size
reinforcement learning algorithms
gradient method
optimal control
importance sampling
temporal difference
reinforcement learning methods
reward function
state space
approximation methods
partially observable markov decision processes
control problems
confidence intervals
neuro fuzzy
natural actor critic
markov decision processes