Login / Signup
TD: (Near)-Optimal Off-Policy TD Learning.
Bo Liu
Daoming Lyu
Wen Dong
Saad Biaz
Published in:
CoRR (2017)
Keyphrases
</>
td learning
temporal difference
evaluation function
function approximation
reinforcement learning
policy evaluation
monte carlo
reinforcement learning algorithms
step size
model free
multi step
td methods
policy iteration
action selection
least squares
k nearest neighbor
markov chain