Explore no more: improved high-probability regret bounds for non-stochastic bandits.

Published in: CoRR (2015)

Keyphrases

multi armed bandit
regret bounds
online learning
lower bound
reinforcement learning
linear regression
upper bound
image sequences
support vector
probability distribution
information theoretic
conditional probabilities