Instance-dependent Regret Bounds for Dueling Bandits.

Akshay Balsubramani Zohar S. Karnin Robert E. Schapire Masrour Zoghi

Published in: COLT (2016)

Keyphrases

regret bounds
multi armed bandit
online learning
linear regression
lower bound
upper bound
feature selection
optimal solution
support vector
bregman divergences
online convex optimization