Contextual Bandit Learning with Predictable Rewards

Alekh Agarwal Miroslav Dudík Satyen Kale John Langford Robert E. Schapire

Published in: CoRR (2012)

Keyphrases

reinforcement learning
learning systems
learning algorithm
supervised learning
learning process
natural language processing
online learning
unsupervised learning
contextual bandit
upper confidence bound