The second order temporal difference error for Sarsa(λ).
Qi-ming FuQuan LiuFei XiaoGuixin ChenPublished in: ADPRL (2013)
Keyphrases
- temporal difference
- reinforcement learning
- function approximation
- evaluation function
- reinforcement learning algorithms
- td learning
- temporal difference learning
- monte carlo
- model free
- step size
- action selection
- policy evaluation
- function approximators
- policy iteration
- supervised learning
- temporal difference methods
- convergence rate
- active learning
- machine learning
- data sets
- markov chain
- state space
- variance reduction
- feature space
- decision making