Bounded Policy Synthesis for POMDPs with Safe-Reachability Objectives.
Yue WangSwarat ChaudhuriLydia E. KavrakiPublished in: AAMAS (2018)
Keyphrases
- action space
- state space
- continuous state
- markov decision processes
- reinforcement learning
- partially observable
- partially observable markov decision processes
- markov decision problems
- optimal policy
- belief state
- continuous state spaces
- policy search
- markov decision process
- dynamic programming
- state action
- dynamical systems
- reinforcement learning algorithms
- action selection
- markov chain
- partially observable markov decision process
- dec pomdps
- policy iteration
- partial observability
- belief space
- reinforcement learning problems
- policy iteration algorithm
- infinite horizon
- function approximation
- expected reward
- program synthesis
- predictive state representations
- average reward
- function approximators
- state dependent
- initial state
- asymptotically optimal
- reward function
- finite state
- texture synthesis
- partially observable environments
- point based value iteration
- learning algorithm
- model free reinforcement learning
- policy gradient
- finite horizon
- long run
- multiple objectives
- decision trees