SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization.
Jaafar MhamedShangding GuPublished in: CoRR (2023)
Keyphrases
- policy iteration
- actor critic
- reinforcement learning
- temporal difference
- markov decision processes
- optimal policy
- model free
- policy evaluation
- reinforcement learning algorithms
- function approximation
- approximate dynamic programming
- reinforcement learning problems
- finite state
- state space
- infinite horizon
- global optimization
- temporal difference learning
- step size
- dynamic programming
- monte carlo
- optimization problems
- policy gradient
- action space
- optimal control
- linear programming
- learning algorithm
- state and action spaces
- optimization method
- optimization algorithm
- state action
- policy gradient methods
- machine learning
- function approximators
- reward function
- action selection
- long run
- optimization process
- multi agent