On the Convergence of Policy Iteration in Finite State Undiscounted Markov Decision Processes: The Unichain Case.
Arie HordijkMartin L. PutermanPublished in: Math. Oper. Res. (1987)
Keyphrases
- finite state
- markov decision processes
- policy iteration
- optimal policy
- average reward
- reinforcement learning
- policy evaluation
- state space
- average cost
- dynamic programming
- action sets
- approximate dynamic programming
- stationary policies
- transition matrices
- markov decision problems
- partially observable markov decision processes
- markov decision process
- partially observable
- infinite horizon
- markov chain
- factored mdps
- reinforcement learning algorithms
- discounted reward
- continuous state
- actor critic
- policy iteration algorithm
- sufficient conditions
- state and action spaces
- reward function
- action space
- model checking
- decision problems
- convergence speed
- machine learning