lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits.

Kevin G. Jamieson Matthew Malloy Robert D. Nowak Sébastien Bubeck

Published in: CoRR (2013)

Keyphrases

optimal solution
worst case
dynamic programming
objective function
closed form
learning algorithm
computational complexity
np hard
multi armed bandit
expectation maximization
monte carlo
log likelihood
linear programming
optimal policy