Optimality of LSTD and its Relation to MC.
Steffen GrünewälderSepp HochreiterKlaus ObermayerPublished in: IJCNN (2007)
Keyphrases
- reinforcement learning
- temporal difference
- least squares
- policy evaluation
- temporal difference learning
- td learning
- policy iteration
- function approximation
- model free
- markov decision processes
- monte carlo
- average reward
- evaluation function
- variance reduction
- cost function
- linear approximation
- optimal solution
- learning algorithm
- action selection
- supervised learning
- reinforcement learning algorithms
- state space
- reinforcement learning methods
- training data