Sign in
Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits.
Miroslav Dudík
Dumitru Erhan
John Langford
Lihong Li
Published in:
UAI (2012)
Keyphrases
</>
non stationary
policy evaluation
random fields
least squares
sample size
temporal difference
model selection
fixed point
model free