Lower bounds and selectivity of weak-consistent policies in stochastic multi-armed bandit problem.
Antoine SalomonJean-Yves AudibertIssam El AlaouiPublished in: J. Mach. Learn. Res. (2013)
Keyphrases
- lower bound
- upper bound
- control policies
- branch and bound algorithm
- stochastic inventory control
- optimal policy
- objective function
- monte carlo
- branch and bound
- upper and lower bounds
- optimal solution
- multi armed bandit problems
- optimal cost
- regret bounds
- lower bounding
- markov decision process
- search algorithm
- linear programming relaxation
- vc dimension
- worst case
- base stock policies
- search space