Coordination without communication: optimal regret in two players multi-armed bandits.

Sébastien Bubeck Thomas Budzinski

Published in: COLT (2020)

Keyphrases

multi armed bandits
multi armed bandit
bandit problems
worst case
information sharing
cooperative
game theory
dynamic programming
regret bounds
online learning
linear regression
multi agent
lower bound
pairwise
decision makers