Weighted importance sampling for off-policy learning with linear function approximation.

Ashique Rupam Mahmood Hado van Hasselt Richard S. Sutton

Published in: NIPS (2014)

Keyphrases

function approximation
reinforcement learning
learning tasks
importance sampling
temporal difference learning algorithms
function approximators
learning algorithm
decision trees
learning process
supervised learning
monte carlo
temporal difference methods
active learning