Login / Signup
A minimax and asymptotically optimal algorithm for stochastic bandits.
Pierre Ménard
Aurélien Garivier
Published in:
CoRR (2017)
Keyphrases
</>
asymptotically optimal
dynamic programming
worst case
np hard
monte carlo
learning algorithm
neural network
objective function
simulated annealing
optimal solution
search space
machine learning
special case
state space
supply chain