An Off-policy Policy Gradient Theorem Using Emphatic Weightings.

Ehsan Imani Eric Graves Martha White

Published in: CoRR (2018)

Keyphrases

policy gradient
parametric optimization
reinforcement learning
actor critic
gradient method
function approximation
model free reinforcement learning
optimal control
approximation methods
reinforcement learning algorithms
partially observable markov decision processes
variance reduction
average reward
machine learning
state space
state action