Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient Algorithm.
Qinbo BaiWashim Uddin MondalVaneet AggarwalPublished in: CoRR (2024)
Keyphrases
- average reward
- optimal policy
- policy gradient
- actor critic
- markov decision processes
- infinite horizon
- long run
- total reward
- stochastic games
- reinforcement learning
- policy iteration
- primal dual
- finite horizon
- learning algorithm
- optimal control
- state action
- dynamic programming
- state space
- policy gradient reinforcement learning
- discount factor
- markov decision process
- model free
- partially observable markov decision processes
- average cost
- rl algorithms
- partially observable
- decision problems
- markov decision problems
- special case
- cost function
- optimal solution
- initial state
- convergence rate
- machine learning
- computational complexity
- np hard
- gradient method
- search space
- action space
- linear programming
- fixed point