The optimal unbiased value estimator and its relation to LSTD, TD and MC.
Steffen GrünewälderKlaus ObermayerPublished in: Mach. Learn. (2011)
Keyphrases
- temporal difference
- reinforcement learning
- least squares
- policy evaluation
- td learning
- temporal difference learning
- function approximation
- monte carlo
- evaluation function
- reinforcement learning algorithms
- dynamic programming
- step size
- model free
- policy iteration
- optimal solution
- eligibility traces
- markov decision processes
- fixed point
- variance reduction
- action selection
- markov chain
- maximum likelihood
- supervised learning
- learning algorithm
- neural network
- estimation error
- function approximators