Login / Signup
Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search.
Lars Buesing
Theophane Weber
Yori Zwols
Nicolas Heess
Sébastien Racanière
Arthur Guez
Jean-Baptiste Lespiau
Published in:
ICLR (Poster) (2019)
Keyphrases
</>
policy search
reinforcement learning
continuous state
continuous action
dynamic programming
reinforcement learning algorithms
partially observable markov decision processes
reward function
policy gradient
robot navigation
function approximators
markov decision problems