Temporal Difference Uncertainties as a Signal for Exploration.
Sebastian FlennerhagJane X. WangPablo SprechmannFrancesco VisinAlexandre GalashovSteven KapturowskiDiana L. BorsaNicolas HeessAndré BarretoRazvan PascanuPublished in: CoRR (2020)
Keyphrases
- temporal difference
- action selection
- td learning
- reinforcement learning
- function approximation
- evaluation function
- model free
- temporal difference learning
- monte carlo
- step size
- reinforcement learning algorithms
- policy evaluation
- actor critic
- function approximators
- temporal difference methods
- supervised learning
- multi step
- learning process
- policy iteration
- multiscale
- predictive state representations