From MDP to POMDP and Back: Safety and Compositionality.
Manuela-Luminita BujorianuTristan CaulfieldDavid J. PymRafael WisniewskiPublished in: ECC (2023)
Keyphrases
- markov decision process
- markov decision processes
- partially observable markov decision processes
- finite state
- optimal policy
- partially observable
- reinforcement learning
- state space
- reward function
- partially observable markov decision process
- markov decision problems
- planning under uncertainty
- state and action spaces
- bayesian reinforcement learning
- belief space
- decision problems
- belief state
- partial observability
- markov chain
- decision theoretic
- infinite horizon
- policy iteration
- dynamical systems
- dynamic programming
- multi agent
- continuous state
- initial state
- reinforcement learning algorithms
- transition probabilities
- linear programming
- average reward
- hidden state
- utility function
- policy evaluation
- model checking
- model free reinforcement learning
- average cost
- long run
- domain independent