Augmented Proximal Policy Optimization for Safe Reinforcement Learning.
Juntao DaiJiaming JiLong YangQian ZhengGang PanPublished in: AAAI (2023)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- markov decision processes
- learning algorithm
- action selection
- markov decision process
- optimization problems
- reinforcement learning problems
- optimization process
- state space
- global optimization
- policy iteration
- state and action spaces
- reinforcement learning algorithms
- average reward
- function approximators
- action space
- partially observable
- partially observable domains
- function approximation
- optimization algorithm
- optimization method
- partially observable environments
- machine learning
- reward function
- temporal difference
- infinite horizon
- constrained optimization
- learning problems
- state action
- policy evaluation
- multi objective
- learning process
- multi agent