Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient.

Samuele Tosatto João Carvalho Jan Peters

Published in: CoRR (2020)

Keyphrases

policy gradient
reinforcement learning
actor critic
reinforcement learning algorithms
function approximation
policy search
gradient method
optimal control
policy gradient methods
reinforcement learning methods
average reward
multi agent
state space
model free
model free reinforcement learning
partially observable markov decision processes
temporal difference
approximation methods
convergence rate
markov decision processes
machine learning
neural network
single agent
state action
variance reduction
least squares
dynamic programming
learning algorithm