Safe Reinforcement Learning via Observation Shielding.
Joe McCalmonTongtong LiuReid GoldsmithAndrew CyhaniukTalal HalabiSarra AlqahtaniPublished in: HICSS (2023)
Keyphrases
- reinforcement learning
- function approximation
- temporal difference
- multi agent
- state space
- temporal difference learning
- learning algorithm
- markov decision processes
- machine learning
- multi agent reinforcement learning
- reinforcement learning algorithms
- model free
- optimal policy
- artificial neural networks
- case study
- supervised learning
- dynamic programming
- optimal control
- search space
- robot control
- markov decision process
- genetic algorithm
- evolutionary learning
- databases
- transition model
- perceptual aliasing
- robotic control