An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward.
Shangdong YangYang GaoPublished in: IEEE Trans. Neural Networks Learn. Syst. (2021)
Keyphrases
- dynamic programming
- optimal solution
- computational cost
- worst case
- learning algorithm
- cost function
- multi armed bandit
- locally optimal
- experimental evaluation
- computational complexity
- detection algorithm
- k means
- genetic algorithm
- exhaustive search
- path planning
- matching algorithm
- particle swarm optimization
- high accuracy
- probabilistic model
- np hard
- search space
- closed form
- globally optimal
- optimal strategy
- linear programming
- objective function
- monte carlo
- convergence rate
- recognition algorithm
- optimal path
- control policy
- feature selection
- significant improvement