Strengthening Deterministic Policies for POMDPs.
Leonore WintererRalf WimmerNils JansenBernd BeckerPublished in: CoRR (2020)
Keyphrases
- partially observable markov decision processes
- optimal policy
- policy search
- finite state
- policy gradient methods
- stochastic domains
- reinforcement learning
- dynamical systems
- predictive state representations
- markov decision problems
- belief state
- dynamic programming
- continuous state
- planning under uncertainty
- markov decision processes
- state space
- policy iteration algorithm
- decision problems
- expected reward
- belief space
- markov decision process
- partially observable
- infinite horizon
- partial observability
- approximate solutions
- multi agent
- decision processes
- decision trees
- dec pomdps
- revenue management
- average reward
- reward function
- lower bound
- linear programming