A near-optimal high-probability swap-Regret upper bound for multi-agent bandits in unknown general-sum games.

Zhiming Huang Jianping Pan

Published in: UAI (2023)

Keyphrases

upper bound
lower bound
multi agent
multi armed bandit
regret bounds
worst case
error probability
reinforcement learning
objective function
learning agents
online learning
fuzzy logic
graph cuts
intelligent agents
game theory
single agent
probability distribution
expert advice
multi agent systems
cooperative