Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm.
Qinbo BaiAmrit Singh BediVaneet AggarwalPublished in: CoRR (2022)
Keyphrases
- policy gradient
- reinforcement learning
- actor critic
- function approximation
- reinforcement learning algorithms
- policy search
- model free reinforcement learning
- policy gradient methods
- optimal control
- gradient method
- partially observable markov decision processes
- state space
- optimal policy
- model free
- reinforcement learning methods
- temporal difference
- average reward
- approximation methods
- learning algorithm
- control problems
- dynamic environments
- markov decision processes
- action space
- single agent
- variance reduction
- temporal difference learning
- action selection
- rl algorithms
- dynamic programming
- approximate dynamic programming
- np hard
- multi agent