Smoothing Policy Iteration for Zero-sum Markov Games.
Yangang RenYao LyuWenxuan WangShengbo Eben LiZeyang LiJingliang DuanPublished in: CoRR (2022)
Keyphrases
- markov games
- policy iteration
- markov decision processes
- markov decision process
- approximate policy iteration
- reinforcement learning
- state space
- optimal policy
- finite state
- reinforcement learning algorithms
- dynamic programming
- finite horizon
- average reward
- infinite horizon
- stochastic games
- markov decision problems
- multiagent reinforcement learning
- average cost
- partially observable
- temporal difference learning
- multi agent
- learning algorithm
- decision problems
- control problems
- monte carlo