Minimax Policies for Adversarial and Stochastic Bandits.
Jean-Yves AudibertSébastien BubeckPublished in: COLT (2009)
Keyphrases
- stochastic systems
- control policies
- minimax search
- monte carlo
- optimal policy
- stochastic inventory control
- multi armed bandit problems
- evaluation function
- stochastic model
- stochastic optimization
- stochastic models
- echelon stock
- multi agent
- multi armed bandits
- asymptotic properties
- multi armed bandit
- regret bounds
- state dependent
- base stock policies
- learning automata