Login / Signup
On the Design of Estimators for Bandit Off-Policy Evaluation.
Nikos Vlassis
Aurélien Bibaut
Maria Dimakopoulou
Tony Jebara
Published in:
ICML (2019)
Keyphrases
</>
policy evaluation
least squares
reinforcement learning
monte carlo
temporal difference
random sampling