Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm.
Qinbo BaiAmrit Singh BediVaneet AggarwalPublished in: AAAI (2023)
Keyphrases
- policy gradient
- reinforcement learning
- actor critic
- reinforcement learning algorithms
- function approximation
- policy search
- policy gradient methods
- gradient method
- optimal control
- model free reinforcement learning
- reinforcement learning methods
- temporal difference
- machine learning
- average reward
- function approximators
- partially observable markov decision processes
- learning algorithm
- state action
- optimal policy
- state space
- multi agent
- single agent
- control problems
- approximation methods
- model free
- markov decision processes
- variance reduction
- model checking
- neural network