An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning.

Richard S. Sutton Ashique Rupam Mahmood Martha White

Published in: CoRR (2015)

Keyphrases

temporal difference learning
fixed point
function approximation
evaluation function
game playing
reinforcement learning
temporal difference
approximate value iteration
reinforcement learning algorithms
markov decision process
markov decision processes
policy iteration
learning algorithm
state space
monte carlo