KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints.
Aurélien GarivierHédi HadijiPierre MénardGilles StoltzPublished in: CoRR (2018)
Keyphrases
- multi armed bandit
- regret bounds
- distribution free
- normal distribution
- large deviations
- linear regression
- online learning
- lower bound
- reinforcement learning
- probability distribution
- vc dimension
- special case
- model selection
- random variables
- probability density function
- learning algorithm
- upper bound
- kullback leibler divergence
- kl divergence
- bregman divergences