Incremental Policy Iteration with Guaranteed Escape from Local Optima in POMDP Planning.
Marek GrzesPascal PoupartPublished in: AAMAS (2015)
Keyphrases
- policy iteration
- markov decision processes
- markov decision problems
- policy evaluation
- partially observable
- partially observable markov decision processes
- optimal policy
- markov decision process
- reinforcement learning
- finite state
- state space
- planning problems
- sample path
- model free
- infinite horizon
- average reward
- planning under uncertainty
- fixed point
- least squares
- partially observable markov decision process
- dynamic programming
- evolutionary algorithm
- temporal difference
- linear programming
- optimal control
- heuristic search
- continuous state
- state and action spaces
- long run
- reward function
- initial state
- dynamical systems
- markov chain
- multi agent
- action selection
- average cost
- planning domains
- function approximation
- domain independent
- model checking
- multistage
- bayesian networks
- machine learning