Sign in
Further Optimal Regret Bounds for Thompson Sampling.
Shipra Agrawal
Navin Goyal
Published in:
AISTATS (2013)
Keyphrases
</>
regret bounds
multi armed bandit
training data
multi class
online learning
monte carlo
learning algorithm
similarity measure
optimal solution
closed form