Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion.

Junghyun Lee Se-Young Yun Kwang-Sung Jun

Published in: AISTATS (2024)

Keyphrases

regret bounds
reinforcement learning
multi armed bandit
lower bound
expert advice
active learning
probability distribution