Policy graph pruning and optimization in Monte Carlo Value Iteration for continuous-state POMDPs.
Weisheng QianQuan LiuZongzhang ZhangZhiyuan PanShan ZhongPublished in: SSCI (2016)
Keyphrases
- monte carlo
- partially observable markov decision processes
- continuous state
- finite state
- markov chain
- policy search
- optimal policy
- reinforcement learning
- state space
- belief state
- monte carlo methods
- markov decision processes
- dynamic programming
- dynamical systems
- decision problems
- belief space
- control policies
- action space
- continuous state spaces
- search space
- multi agent
- particle filter
- monte carlo tree search
- state dependent
- infinite horizon
- policy gradient
- planning problems
- temporal difference
- partially observable
- markov decision process
- belief revision
- policy iteration
- initial state
- average reward
- machine learning
- transition probabilities
- steady state
- average cost
- approximate solutions
- function approximation
- point based value iteration