Login / Signup
Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates.
Hugo Penedones
Carlos Riquelme
Damien Vincent
Hartmut Maennel
Timothy A. Mann
André Barreto
Sylvain Gelly
Gergely Neu
Published in:
CoRR (2019)
Keyphrases
</>
temporal difference learning
temporal difference
function approximation
policy evaluation
reinforcement learning
evaluation function
least squares
fixed point
monte carlo
policy iteration
markov decision process
bayesian networks
search space
state space
model free
variance reduction