An Optimal Algorithm for the Stochastic Bandits with Knowing Near-optimal Mean Reward.
Shangdong YangHao WangYang GaoXingguo ChenPublished in: AAMAS (2018)
Keyphrases
- dynamic programming
- worst case
- optimal solution
- cost function
- k means
- multi armed bandit
- learning algorithm
- optimization algorithm
- closed form
- preprocessing
- matching algorithm
- expectation maximization
- computational complexity
- objective function
- experimental evaluation
- computational cost
- times faster
- globally optimal
- particle swarm optimization
- locally optimal
- recognition algorithm
- optimal strategy
- genetic algorithm
- stochastic approximation
- search space
- similarity measure
- segmentation algorithm
- reinforcement learning
- upper bound
- probabilistic model
- significant improvement
- evolutionary algorithm