Computational comparison of policy iteration algorithms for discounted markov decision processes.
R. HartleyA. C. LavercombeLyn C. ThomasPublished in: Comput. Oper. Res. (1986)
Keyphrases
- markov decision processes
- policy iteration
- optimal policy
- factored mdps
- finite state
- average reward
- reinforcement learning
- sample path
- infinite horizon
- state space
- dynamic programming
- policy evaluation
- markov decision process
- average cost
- approximate dynamic programming
- policy iteration algorithm
- partially observable
- transition matrices
- model free
- finite horizon
- fixed point
- markov decision problems
- actor critic
- stochastic shortest path
- decision processes
- partially observable markov decision processes
- temporal difference
- lead time
- monte carlo
- linear programming
- state and action spaces
- discounted reward
- least squares
- learning algorithm
- machine learning