Supplementary material for Uncorrected least-squares temporal difference with lambda-return.

Takayuki Osogami

Published in: CoRR (2019)

Keyphrases

temporal difference
least squares
policy evaluation
td learning
reinforcement learning
function approximation
evaluation function
policy iteration
monte carlo
model free
step size
temporal difference learning
action selection
reinforcement learning algorithms
temporal difference methods
optical flow
actor critic
learning algorithm
supervised learning
learning tasks
fixed point
markov decision processes
function approximators
feature space