Safe Policy Improvement for POMDPs via Finite-State Controllers.
Thiago D. SimãoMarnix SuilenNils JansenPublished in: AAAI (2023)
Keyphrases
- partially observable markov decision processes
- optimal policy
- policy search
- partially observable
- markov decision processes
- markov decision problems
- policy gradient
- markov decision process
- point based value iteration
- dynamic programming
- belief state
- finite state
- reinforcement learning
- dynamical systems
- state space
- expected reward
- temporal difference
- infinite horizon
- decision problems
- policy iteration
- decision processes
- belief space
- distributed constraint optimization