Login / Signup
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits.
Qiwei Di
Tao Jin
Yue Wu
Heyang Zhao
Farzad Farnoud
Quanquan Gu
Published in:
CoRR (2023)
Keyphrases
</>
regret bounds
multi armed bandit
online learning
linear regression
lower bound
upper bound
prediction error
linear predictors