Policy iteration for bounded-parameter POMDPs.
Yaodong NiZhi-Qiang LiuPublished in: Soft Comput. (2013)
Keyphrases
- policy iteration
- markov decision processes
- reinforcement learning
- policy iteration algorithm
- optimal policy
- markov decision problems
- finite state
- average reward
- partially observable markov decision processes
- sample path
- model free
- state space
- markov decision process
- least squares
- infinite horizon
- partially observable
- policy evaluation
- fixed point
- temporal difference
- dynamic programming
- continuous state
- linear programming
- reinforcement learning algorithms
- function approximation
- convergence rate
- actor critic
- markov chain
- action space
- belief state
- policy search