Login / Signup
Using Options for Long-Horizon Off-Policy Evaluation.
Zhaohan Daniel Guo
Philip S. Thomas
Emma Brunskill
Published in:
CoRR (2017)
Keyphrases
</>
policy evaluation
least squares
reinforcement learning
temporal difference
monte carlo
model free
variance reduction
policy iteration
markov decision processes
matrix inversion
function approximation
semi parametric
statistical inference
neural network
decision making
evaluation function