Login / Signup
Variance-aware Regret Bounds for Stochastic Contextual Dueling Bandits.
Qiwei Di
Tao Jin
Yue Wu
Heyang Zhao
Farzad Farnoud
Quanquan Gu
Published in:
ICLR (2024)
Keyphrases
</>
regret bounds
lower bound
online learning
linear regression
multi armed bandit
upper bound
least squares
machine learning
objective function
maximum likelihood
bregman divergences