Login / Signup
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning.
Nathan Kallus
Masatoshi Uehara
Published in:
NeurIPS (2019)
Keyphrases
</>
reinforcement learning
policy evaluation
temporal difference
least squares
markov decision processes
function approximation
policy iteration
model free
monte carlo
optimal policy
td learning
machine learning
variance reduction
learning algorithm
cost function