An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits.
Peter AuerChao-Kai ChiangPublished in: COLT (2016)
Keyphrases
- worst case
- dynamic programming
- optimal solution
- learning algorithm
- regret bounds
- globally optimal
- cost function
- detection algorithm
- optimization algorithm
- preprocessing
- search space
- computational cost
- multi armed bandit
- locally optimal
- monte carlo
- expectation maximization
- simulated annealing
- computational complexity
- objective function
- linear programming
- least squares
- significant improvement
- matching algorithm
- similarity measure
- neural network