On recursive temporal difference and eligibility traces.
Simone BaldiDi LiuZichen ZhangPublished in: IECON (2020)
Keyphrases
- eligibility traces
- temporal difference
- reinforcement learning algorithms
- policy evaluation
- reinforcement learning
- td learning
- function approximation
- model free
- evaluation function
- monte carlo
- reinforcement learning methods
- step size
- least squares
- policy iteration
- supervised learning
- action selection
- markov decision processes
- state space
- function approximators
- multi agent
- dynamic programming
- learning agent
- partially observable markov decision processes
- importance sampling
- cost function
- optimal policy