Per-decision Multi-step Temporal Difference Learning with Control Variates.

Kristopher De Asis Richard S. Sutton

Published in: CoRR (2018)

Keyphrases

multi step
temporal difference learning
fixed point
function approximation
reinforcement learning
decision making
knn
game playing
temporal difference
reinforcement learning algorithms
evaluation function
markov decision process
state space
k nearest neighbor
principal components
control strategy