An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits.

Peter Auer Chao-Kai Chiang

Published in: COLT (2016)

Keyphrases

worst case
dynamic programming
optimal solution
learning algorithm
regret bounds
globally optimal
cost function
detection algorithm
optimization algorithm
preprocessing
search space
computational cost
multi armed bandit
locally optimal
monte carlo
expectation maximization
simulated annealing
computational complexity
objective function
linear programming
least squares
significant improvement
matching algorithm
similarity measure
neural network