Open Problem: First-Order Regret Bounds for Contextual Bandits.

Alekh Agarwal Akshay Krishnamurthy John Langford Haipeng Luo Robert E. Schapire

Published in: COLT (2017)

Keyphrases

regret bounds
multi armed bandit
lower bound
online learning
linear regression
upper bound
higher order
learning theory
special case
data points
least squares
bregman divergences
online convex optimization