Login / Signup
KL-UCB-Switch: Optimal Regret Bounds for Stochastic Bandits from Both a Distribution-Dependent and a Distribution-Free Viewpoints.
Aurélien Garivier
Hédi Hadiji
Pierre Ménard
Gilles Stoltz
Published in:
J. Mach. Learn. Res. (2022)
Keyphrases
</>
multi armed bandit
regret bounds
distribution free
normal distribution
large deviations
online learning
reinforcement learning
lower bound
linear regression
upper bound
vc dimension
probability density function
random variables
monte carlo