Combining policy gradient and Q-learning.

Brendan O'Donoghue Rémi Munos Koray Kavukcuoglu Volodymyr Mnih

Published in: ICLR (Poster) (2017)

Keyphrases

policy gradient
reinforcement learning
function approximation
actor critic
reinforcement learning algorithms
model free reinforcement learning
state action
single agent
state space
cooperative
multi agent
learning algorithm
dynamic programming
temporal difference
optimal control
function approximators
policy search
reinforcement learning methods
temporal difference learning
gradient method
learning tasks
model free