CUP: A Conservative Update Policy Algorithm for Safe Reinforcement Learning.
Long YangJiaming JiJuntao DaiYu ZhangPengfei LiGang PanPublished in: CoRR (2022)
Keyphrases
- dynamic programming
- high accuracy
- learning algorithm
- preprocessing
- reinforcement learning
- times faster
- k means
- worst case
- computational cost
- detection algorithm
- expectation maximization
- particle swarm optimization
- experimental evaluation
- np hard
- simulated annealing
- cost function
- computational complexity
- segmentation algorithm
- objective function
- machine learning
- search algorithm
- clustering method
- optimal policy
- approximate dynamic programming