Safe Reinforcement Learning via Shielding for POMDPs.
Steven CarrNils JansenSebastian JungesUfuk TopcuPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- partially observable markov decision processes
- partially observable
- continuous state
- markov decision processes
- function approximation
- model free
- state space
- policy search
- dynamic programming
- optimal policy
- reinforcement learning algorithms
- learning algorithm
- temporal difference
- machine learning
- transfer learning
- supervised learning
- multi agent
- knowledge base
- control problems
- hidden state
- learning problems
- markov decision problems
- dec pomdps