A policy iteration algorithm for Markov decision processes skip-free in one direction.
Joke LambertBenny Van HoudtChris BlondiaPublished in: Numerical Methods for Structured Markov Chains (2007)
Keyphrases
- markov decision processes
- policy iteration algorithm
- policy iteration
- finite state
- optimal policy
- reinforcement learning
- state space
- transition matrices
- dynamic programming
- reinforcement learning algorithms
- average reward
- infinite horizon
- average cost
- decision processes
- action space
- stochastic games
- partially observable
- markov decision process
- planning under uncertainty
- partially observable markov decision processes
- heuristic search
- least squares
- search space