Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games.
Shicong CenYuejie ChiSimon Shaolei DuLin XiaoPublished in: ICLR (2023)
Keyphrases
- markov games
- markov decision processes
- markov decision process
- multiagent reinforcement learning
- reinforcement learning algorithms
- reinforcement learning
- control problems
- optimal policy
- state space
- multiagent systems
- stochastic games
- infinite horizon
- policy iteration
- finite horizon
- nash equilibrium
- multi agent
- average reward
- cooperative
- finite state
- optimal control
- optimal stopping
- average cost
- machine learning
- markov decision problems
- sufficient conditions
- robust optimization
- dynamic programming
- reward function
- model free
- convergence speed
- incomplete information