Login / Signup
The Pitfalls of Regularization in Off-Policy TD Learning.
Gaurav Manek
J. Zico Kolter
Published in:
NeurIPS (2022)
Keyphrases
</>
td learning
temporal difference
evaluation function
function approximation
reinforcement learning
reinforcement learning algorithms
policy evaluation
monte carlo
multi step
model free
multi agent
active learning
fixed point
action selection