Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning.
Haitong MaChangliu LiuShengbo Eben LiSifa ZhengWenchao SunJianyu ChenPublished in: CoRR (2021)
Keyphrases
- model free
- reinforcement learning
- function approximators
- policy iteration
- policy evaluation
- reinforcement learning algorithms
- rl algorithms
- function approximation
- agent learns
- optimal policy
- hierarchical reinforcement learning
- temporal difference
- policy search
- action selection
- average reward
- markov decision problems
- state space
- markov decision processes
- learning agent
- constraint violations
- markov decision process
- partially observable markov decision processes
- policy gradient
- learning algorithm
- control policy
- hard constraints
- temporal difference learning
- reward signal
- reward function
- learning problems
- transfer learning
- action space
- partially observable
- infinite horizon
- learning tasks
- linear combination
- multi agent
- neural network
- agent receives
- impedance control