An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem.

Published in: Oper. Res. Lett. (2015)

Keyphrases