A Local Temporal Difference Code for Distributional Reinforcement Learning.
Pablo TanoPeter DayanAlexandre PougetPublished in: NeurIPS (2020)
Keyphrases
- temporal difference
- reinforcement learning
- function approximation
- td learning
- reinforcement learning algorithms
- evaluation function
- temporal difference learning
- model free
- step size
- policy evaluation
- actor critic
- action selection
- monte carlo
- temporal difference methods
- function approximators
- policy iteration
- supervised learning
- markov decision processes
- optimal control
- state space
- dynamic programming
- multi agent
- optimal policy
- artificial neural networks
- markov decision problems
- multiscale
- decision making
- machine learning