Login / Signup
Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property.
Ioannis Anagnostides
Ioannis Panageas
Gabriele Farina
Tuomas Sandholm
Published in:
CoRR (2023)
Keyphrases
</>
multi player
policy gradient
game playing
reinforcement learning algorithms
control system
learning process
mobile robot
upper bound
convergence speed