Login / Signup
Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits
Miroslav Dudík
Dumitru Erhan
John Langford
Lihong Li
Published in:
CoRR (2012)
Keyphrases
</>
non stationary
policy evaluation
reinforcement learning
least squares
sample size
model free
image sequences
markov decision processes