Login / Signup
Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs.
Haipeng Luo
Hanghang Tong
Mengxiao Zhang
Yuheng Zhang
Published in:
CoRR (2022)
Keyphrases
</>
multi armed bandit
multi armed bandits
wide range
regret bounds
graph matching
reinforcement learning
probability distribution
relevance feedback
directed graph
graph model
expert advice
lower bound
pairwise
online learning
loss function
spanning tree
graph representation
graph clustering