An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits.
Peter AuerChao-Kai ChiangPublished in: CoRR (2016)
Keyphrases
- worst case
- dynamic programming
- learning algorithm
- optimal solution
- regret bounds
- multi armed bandit
- cost function
- preprocessing
- np hard
- optimality criterion
- detection algorithm
- search space
- monte carlo
- globally optimal
- computational complexity
- locally optimal
- image segmentation
- confidence bounds
- linear programming
- probabilistic model
- computational cost
- lower bound
- simulated annealing
- state space
- optimal policy
- loss function
- optimal path
- significant improvement
- stochastic approximation
- multi agent
- reinforcement learning
- similarity measure