Feedback graph regret bounds for Thompson Sampling and UCB.

Thodoris Lykouris Éva Tardos Drishti Wali

Published in: ALT (2020)

Keyphrases

multi armed bandit
regret bounds
reinforcement learning
lower bound
e learning
support vector
mutual information
em algorithm
linear regression
learning theory