Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits

Miroslav Dudík Dumitru Erhan John Langford Lihong Li

Published in: CoRR (2012)

Keyphrases

non stationary
policy evaluation
reinforcement learning
least squares
sample size
model free
image sequences
markov decision processes