On the Asymptotic Behaviour of a Constant Stepsize Temporal-Difference Learning Algorithm.
Vladislav TadicPublished in: EuroCOLT (1999)
Keyphrases
- temporal difference
- step size
- learning algorithm
- reinforcement learning algorithms
- reinforcement learning
- td learning
- policy evaluation
- function approximation
- supervised learning
- evaluation function
- convergence rate
- cost function
- model free
- temporal difference learning
- monte carlo
- convergence speed
- machine learning
- machine learning algorithms
- training data
- action selection
- active learning
- policy iteration
- state space
- function approximators
- learning process
- markov decision processes