A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees.
Toshinori KitamuraTadashi KozunoMasahiro KatoYuki IchiharaSoichiro NishimoriAkiyoshi SannaiSho SonodaWataru KumagaiYutaka MatsuoPublished in: CoRR (2024)
Keyphrases
- policy gradient
- reinforcement learning
- policy search
- average reward
- markov decision processes
- reinforcement learning algorithms
- actor critic
- function approximation
- partially observable markov decision processes
- policy iteration
- initial state
- long run
- optimal policy
- upper bound
- optimal control
- dynamic programming
- reinforcement learning methods
- state space
- sample size
- state action
- gradient method
- reward function
- stochastic games
- approximation methods
- learning algorithm
- markov decision process
- variance reduction
- neural network
- markov decision problems
- control system
- computational complexity