Login / Signup

Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property.

Ioannis AnagnostidesIoannis PanageasGabriele FarinaTuomas Sandholm
Published in: CoRR (2023)
Keyphrases
  • multi player
  • policy gradient
  • game playing
  • reinforcement learning algorithms
  • control system
  • learning process
  • mobile robot
  • upper bound
  • convergence speed