Login / Signup
Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search.
Lars Buesing
Theophane Weber
Yori Zwols
Sébastien Racanière
Arthur Guez
Jean-Baptiste Lespiau
Nicolas Heess
Published in:
CoRR (2018)
Keyphrases
</>
policy search
reinforcement learning
continuous state
continuous action
dynamic programming
reinforcement learning algorithms
model free
policy gradient
dynamic environments
function approximation
monte carlo methods