Login / Signup
Graph regret bounds for Thompson Sampling and UCB.
Thodoris Lykouris
Éva Tardos
Drishti Wali
Published in:
CoRR (2019)
Keyphrases
</>
multi armed bandit
regret bounds
reinforcement learning
lower bound
probability distribution
online learning
random sampling
sampling algorithm