Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates.
Carlos RiquelmeHugo PenedonesDamien VincentHartmut MaennelSylvain GellyTimothy A. MannAndré BarretoGergely NeuPublished in: NeurIPS (2019)
Keyphrases
- temporal difference learning
- temporal difference
- policy evaluation
- function approximation
- reinforcement learning
- policy iteration
- evaluation function
- fixed point
- state space
- monte carlo
- confidence intervals
- radial basis function
- support vector
- learning algorithm
- markov decision processes
- learning tasks
- game playing
- model free
- partially observable