Uncorrected Least-Squares Temporal Difference with Lambda-Return.

Takayuki Osogami

Published in: AAAI (2020)

Keyphrases

temporal difference
least squares
policy evaluation
td learning
reinforcement learning
evaluation function
function approximation
step size
monte carlo
model free
temporal difference learning
policy iteration
reinforcement learning algorithms
temporal difference methods
action selection
fixed point
actor critic
optical flow
cost function
markov decision processes
optimization algorithm
neural network
state space
predictive state representations