The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate.

Published in: Math. Oper. Res. (2011)

Keyphrases

policy iteration
markov decision processes
markov decision problems
linear programming
optimal policy
infinite horizon
machine learning
reinforcement learning
multi objective
finite state
transition matrices