Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration.
Dustin MorrillEsra'a SalehMichael BowlingAmy GreenwaldPublished in: CoRR (2022)
Keyphrases
- replicator dynamics
- policy gradient
- multi agent learning
- graph matching
- evolutionary game theory
- reinforcement learning
- gradient method
- multiagent learning
- neural network
- function approximation
- optimal control
- reinforcement learning algorithms
- approximation methods
- temporal difference learning
- action selection
- pattern recognition
- multi agent
- partially observable markov decision processes
- average reward
- single agent
- state action
- variance reduction
- reinforcement learning methods
- particle swarm optimization