Robbing the bandit: less regret in online geometric optimization against an adaptive adversary.
Varsha DaniThomas P. HayesPublished in: SODA (2006)
Keyphrases
- online learning
- online algorithms
- real time
- bandit problems
- regret bounds
- learning algorithm
- random sampling
- upper confidence bound
- constrained optimization
- optimization problems
- binary classification
- global optimization
- optimization method
- optimization algorithm
- markov chain
- lower bound
- optimization model
- multi armed bandit problems
- optimization process
- decision problems
- dynamic programming
- support vector
- reinforcement learning
- neural network