Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes.
Qinbo BaiWashim Uddin MondalVaneet AggarwalPublished in: AAAI (2024)
Keyphrases
- average reward
- markov decision processes
- total reward
- optimal policy
- infinite horizon
- policy iteration
- long run
- policy gradient
- dynamic programming
- reinforcement learning
- finite horizon
- stochastic games
- discount factor
- actor critic
- discounted reward
- average cost
- reinforcement learning algorithms
- state space
- markov decision process
- learning algorithm
- finite state
- optimal control
- partially observable markov decision processes
- state action
- model free
- decision problems
- partially observable
- reward function
- markov chain
- linear programming
- search space
- computational complexity
- optimal solution
- machine learning