Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits.

Haipeng Luo Mengxiao Zhang Peng Zhao Zhi-Hua Zhou

Published in: CoRR (2022)

Keyphrases

regret bounds
multi armed bandits
multi armed bandit
lower bound
multi armed bandit problems
online learning
linear regression
stochastic systems
expert advice
upper bound
bandit problems
case study
loss function
bregman divergences
linear predictors
test bed
worst case
reinforcement learning