The Actor-Advisor: Policy Gradient With Off-Policy Advice.

Hélène Plisnier Denis Steckelmacher Diederik M. Roijers Ann Nowé

Published in: CoRR (2019)

Keyphrases

partially observable markov decision processes
policy gradient
reinforcement learning
actor critic
state space
parametric optimization
model free reinforcement learning
reinforcement learning algorithms
average reward
computational complexity
cost function
sufficient conditions
function approximation
optimal control
single agent