Login / Signup
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning.
Richard S. Sutton
Ashique Rupam Mahmood
Martha White
Published in:
CoRR (2015)
Keyphrases
</>
temporal difference learning
fixed point
function approximation
evaluation function
game playing
reinforcement learning
temporal difference
approximate value iteration
reinforcement learning algorithms
markov decision process
markov decision processes
policy iteration
learning algorithm
state space
monte carlo