Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs.

Haipeng Luo Hanghang Tong Mengxiao Zhang Yuheng Zhang

Published in: CoRR (2022)

Keyphrases

multi armed bandit
multi armed bandits
wide range
regret bounds
graph matching
reinforcement learning
probability distribution
relevance feedback
directed graph
graph model
expert advice
lower bound
pairwise
online learning
loss function
spanning tree
graph representation
graph clustering