Sign in
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits.
Yu-Xiang Wang
Alekh Agarwal
Miroslav Dudík
Published in:
ICML (2017)
Keyphrases
</>
policy evaluation
least squares
dynamic programming
monte carlo
temporal difference
support vector machine
evaluation function