Thompson Sampling for Contextual Bandits with Linear Payoffs

Shipra Agrawal Navin Goyal

Published in: CoRR (2012)

Keyphrases

contextual information
information systems
neural network
machine learning
real time
reinforcement learning
monte carlo
transfer function
sampled data
multi armed bandit
multi armed bandits