A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation.
Richard S. SuttonCsaba SzepesváriHamid Reza MaeiPublished in: NIPS (2008)
Keyphrases
- function approximation
- td learning
- temporal difference
- function approximators
- temporal difference learning
- reinforcement learning
- actor critic
- learning algorithm
- temporal difference methods
- model free
- evaluation function
- learning tasks
- td methods
- reinforcement learning algorithms
- monte carlo
- dynamic programming
- reinforcement learning problems
- cost function
- policy gradient
- neural network
- search space
- evolutionary methods
- policy evaluation
- learning process
- particle swarm optimization
- policy iteration
- action selection
- reinforcement learning methods
- gradient method
- active learning
- step size
- support vector
- convergence rate