Convergence of policy gradient for entropy regularized MDPs with neural network approximation in the mean-field regime.
Bekzhan KerimkulovJames-Michael LeahyDavid SiskaLukasz SzpruchPublished in: CoRR (2022)
Keyphrases
- policy gradient
- neural network
- reinforcement learning
- policy search
- approximation methods
- average reward
- markov decision processes
- actor critic
- function approximation
- function approximators
- reinforcement learning algorithms
- partially observable markov decision processes
- convergence rate
- long run
- reward function
- markov random field
- policy iteration
- gradient method
- state space
- partially observable
- markov decision problems
- optimal control
- least squares
- stochastic games
- temporal difference
- finite state
- optimal policy
- learning algorithm
- machine learning