On the Convergence of Natural Policy Gradient and Mirror Descent-Like Policy Methods for Average-Reward MDPs.
Yashaswini MurthyR. SrikantPublished in: CDC (2023)
Keyphrases
- average reward
- policy gradient
- optimal policy
- markov decision processes
- long run
- reinforcement learning
- semi markov decision processes
- model free
- discounted reward
- policy search
- policy iteration
- actor critic
- partially observable markov decision processes
- gradient method
- finite state
- optimization methods
- dynamic programming