Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning.
Yecheng Jason MaAndrew ShenOsbert BastaniDinesh JayaramanPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- model free
- data driven
- function approximation
- state space
- databases
- optimal policy
- temporal difference learning
- adaptive systems
- reinforcement learning algorithms
- temporal difference
- adaptive learning
- markov decision processes
- database
- dynamic programming
- multi agent
- information retrieval
- real time