Login / Signup
Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays.
Junpei Komiyama
Junya Honda
Hiroshi Nakagawa
Published in:
CoRR (2015)
Keyphrases
</>
data analysis
multi armed bandit
worst case
dynamic programming
monte carlo
data sets
neural network
machine learning
reinforcement learning
model selection
statistical analysis
multistage