Login / Signup
Coordination without communication: optimal regret in two players multi-armed bandits.
Sébastien Bubeck
Thomas Budzinski
Published in:
COLT (2020)
Keyphrases
</>
multi armed bandits
multi armed bandit
bandit problems
worst case
information sharing
cooperative
game theory
dynamic programming
regret bounds
online learning
linear regression
multi agent
lower bound
pairwise
decision makers