Login / Signup
Feedback graph regret bounds for Thompson Sampling and UCB.
Thodoris Lykouris
Éva Tardos
Drishti Wali
Published in:
ALT (2020)
Keyphrases
</>
multi armed bandit
regret bounds
reinforcement learning
lower bound
e learning
support vector
mutual information
em algorithm
linear regression
learning theory