Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes.
Washim Uddin MondalVaneet AggarwalPublished in: CoRR (2023)
Keyphrases
- markov decision processes
- average reward
- infinite horizon
- policy iteration
- optimal policy
- dynamic programming
- complexity analysis
- discounted reward
- long run
- state space
- finite horizon
- policy gradient
- reinforcement learning algorithms
- reinforcement learning
- learning algorithm
- average cost
- finite state
- partially observable
- optimality criterion
- computational complexity
- stochastic games
- actor critic
- cost function
- action space
- np hard
- optimal solution
- state and action spaces
- markov chain
- fixed point
- decision problems
- markov decision problems