Coordination without communication: optimal regret in two players multi-armed bandits.

Sébastien Bubeck Thomas Budzinski

Published in: CoRR (2020)

Keyphrases

multi armed bandits
multi armed bandit
bandit problems
worst case
information sharing
game theory
online learning
cooperative
multiagent systems
optimal solution
multi agent
support vector machine
dynamic programming
decision making
loss function
multi agent systems
optimal strategy
reinforcement learning