A minimax and asymptotically optimal algorithm for stochastic bandits.

Pierre Ménard Aurélien Garivier

Published in: ALT (2017)

Keyphrases

asymptotically optimal
dynamic programming
computational complexity
worst case
monte carlo
learning algorithm
objective function
search space
simulated annealing
machine learning
optimal solution
lower bound
np hard
evaluation function
lot sizing
game tree