Directed Policy Gradient for Safe Reinforcement Learning with Human Advice.

Hélène Plisnier Denis Steckelmacher Tim Brys Diederik M. Roijers Ann Nowé

Published in: CoRR (2018)

Keyphrases

policy gradient
reinforcement learning
actor critic
function approximation
policy search
reinforcement learning algorithms
optimal control
policy gradient methods
model free reinforcement learning
gradient method
variance reduction
approximation methods
function approximators
model free
markov decision processes
reinforcement learning methods
partially observable markov decision processes
average reward
temporal difference
real valued
supervised learning
state space
multi agent
single agent
learning problems
approximate dynamic programming
learning algorithm