Optimality of LSTD and its Relation to MC.

Steffen Grünewälder Sepp Hochreiter Klaus Obermayer

Published in: IJCNN (2007)

Keyphrases

reinforcement learning
temporal difference
least squares
policy evaluation
temporal difference learning
td learning
policy iteration
function approximation
model free
markov decision processes
monte carlo
average reward
evaluation function
variance reduction
cost function
linear approximation
optimal solution
learning algorithm
action selection
supervised learning
reinforcement learning algorithms
state space
reinforcement learning methods
training data