Adversarial Policy Gradient for Alternating Markov Games.
Chao GaoMartin MüllerRyan HaywardPublished in: ICLR (Workshop) (2018)
Keyphrases
- policy gradient
- reinforcement learning algorithms
- reinforcement learning
- multiagent reinforcement learning
- state space
- markov decision processes
- model free
- multi agent
- function approximation
- learning algorithm
- reinforcement learning methods
- temporal difference
- single agent
- stochastic games
- temporal difference learning
- optimal control
- reward function
- function approximators
- cooperative
- dynamic environments
- dynamic programming
- monte carlo