Sign in

An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem.

Ryo WatanabeAtsuyoshi NakamuraMineichi Kudo
Published in: Oper. Res. Lett. (2015)
Keyphrases