Adversarial Policy Gradient for Alternating Markov Games.

Chao Gao Martin Müller Ryan Hayward

Published in: ICLR (Workshop) (2018)

Keyphrases

policy gradient
reinforcement learning algorithms
reinforcement learning
multiagent reinforcement learning
state space
markov decision processes
model free
multi agent
function approximation
learning algorithm
reinforcement learning methods
temporal difference
single agent
stochastic games
temporal difference learning
optimal control
reward function
function approximators
cooperative
dynamic environments
dynamic programming
monte carlo