Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits.

Qiwei Di Tao Jin Yue Wu Heyang Zhao Farzad Farnoud Quanquan Gu

Published in: CoRR (2023)

Keyphrases

regret bounds
multi armed bandit
online learning
linear regression
lower bound
upper bound
prediction error
linear predictors