A policy gradient approach for Finite Horizon Constrained Markov Decision Processes.
Soumyajit GuinShalabh BhatnagarPublished in: CoRR (2022)
Keyphrases
- markov decision processes
- finite horizon
- policy gradient
- reinforcement learning algorithms
- average reward
- reinforcement learning
- optimal policy
- actor critic
- partially observable markov decision processes
- policy iteration
- infinite horizon
- state space
- dynamic programming
- average cost
- stochastic games
- finite state
- markov decision process
- function approximation
- optimal control
- state action
- reward function
- control policies
- decision problems
- partially observable
- gradient method
- heuristic search
- multistage
- action space
- function approximators
- decision making