Breaking the √T Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits.

Avishek Ghosh Abishek Sankararaman

Published in: CoRR (2022)

Keyphrases

regret bounds
online learning
lower bound
linear regression
expert advice
multi armed bandit
upper bound
contextual information
context sensitive
bregman divergences
online convex optimization
closed form
linear predictors
multi armed bandits