A minimax and asymptotically optimal algorithm for stochastic bandits.

Pierre Ménard Aurélien Garivier

Published in: CoRR (2017)

Keyphrases

asymptotically optimal
dynamic programming
worst case
np hard
monte carlo
learning algorithm
neural network
objective function
simulated annealing
optimal solution
search space
machine learning
special case
state space
supply chain