Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation.
Shangding GuBilgehan SelYuhao DingLu WangQingwei LinMing JinAlois KnollPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- policy gradient
- function approximation
- markov decision processes
- optimization algorithm
- multi agent
- optimization process
- state space
- global optimization
- eligibility traces
- viewpoint
- model free
- constrained optimization
- reward function
- reinforcement learning algorithms
- temporal difference
- partially observable
- neural network
- mobile robot
- learning algorithm
- optimization method
- average reward
- reward shaping
- agent receives
- edge detection
- supervised learning
- dynamic programming
- multi objective
- learning process
- machine learning