Sign in

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits.

Qiwei DiTao JinYue WuHeyang ZhaoFarzad FarnoudQuanquan Gu
Published in: CoRR (2023)
Keyphrases
  • regret bounds
  • multi armed bandit
  • online learning
  • linear regression
  • lower bound
  • upper bound
  • prediction error
  • linear predictors