Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime.
James-Michael LeahyBekzhan KerimkulovDavid SiskaLukasz SzpruchPublished in: ICML (2022)
Keyphrases
- policy gradient
- neural network
- reinforcement learning
- policy search
- approximation methods
- average reward
- markov decision processes
- actor critic
- reinforcement learning algorithms
- function approximation
- function approximators
- gradient method
- optimal policy
- artificial neural networks
- convergence rate
- partially observable markov decision processes
- least squares
- optimal control
- convergence speed
- markov random field
- dynamic programming
- reinforcement learning methods
- markov decision problems
- state space
- multi agent
- state action
- stochastic games
- single agent
- temporal difference
- finite state