Login / Signup
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits.
Yu-Xiang Wang
Alekh Agarwal
Miroslav Dudík
Published in:
CoRR (2016)
Keyphrases
</>
policy evaluation
worst case
optimal solution
dynamic programming
temporal difference
model free
least squares
function approximation
machine learning
reinforcement learning
multi agent
markov decision processes
optimal control