Login / Signup
Weighted importance sampling for off-policy learning with linear function approximation.
Ashique Rupam Mahmood
Hado van Hasselt
Richard S. Sutton
Published in:
NIPS (2014)
Keyphrases
</>
function approximation
reinforcement learning
learning tasks
importance sampling
temporal difference learning algorithms
function approximators
learning algorithm
decision trees
learning process
supervised learning
monte carlo
temporal difference methods
active learning