The Evolutionary Dynamics of Soft-Max Policy Gradient in Multi-Agent Settings.
Martino BernasconiFederico CacciamaniSimone FioravantiNicola GattiFrancesco TrovòPublished in: AAMAS (2022)
Keyphrases
- policy gradient
- parametric optimization
- actor critic
- reinforcement learning
- gradient method
- function approximation
- optimal control
- model free reinforcement learning
- approximation methods
- reinforcement learning algorithms
- partially observable markov decision processes
- supervised learning
- single agent
- average reward
- convergence rate
- variance reduction