On the Convergence of Policy Iteration in Stationary Dynamic Programming.
Martin L. PutermanShelby L. BrumellePublished in: Math. Oper. Res. (1979)
Keyphrases
- policy iteration
- dynamic programming
- markov decision processes
- stochastic approximation
- optimal policy
- infinite horizon
- markov decision problems
- convergence rate
- reinforcement learning
- optimal control
- state space
- sample path
- approximate dynamic programming
- finite state
- model free
- fixed point
- linear programming
- average reward
- markov decision process
- policy evaluation
- convergence speed
- multistage
- least squares
- function approximation
- stereo matching
- long run
- partially observable
- average cost
- neural network
- discounted reward
- actor critic
- temporal difference
- control strategy
- belief propagation
- linear program
- decision making
- machine learning