The optimal unbiased value estimator and its relation to LSTD, TD and MC.

Steffen Grünewälder Klaus Obermayer

Published in: Mach. Learn. (2011)

Keyphrases

temporal difference
reinforcement learning
least squares
policy evaluation
td learning
temporal difference learning
function approximation
monte carlo
evaluation function
reinforcement learning algorithms
dynamic programming
step size
model free
policy iteration
optimal solution
eligibility traces
markov decision processes
fixed point
variance reduction
action selection
markov chain
maximum likelihood
supervised learning
learning algorithm
neural network
estimation error
function approximators