The Asymptotic Behavior of Undiscounted Value Iteration in Markov Decision Problems.
Paul J. SchweitzerAwi FedergruenPublished in: Math. Oper. Res. (1977)
Keyphrases
- markov decision problems
- state space
- markov decision processes
- policy iteration
- optimal policy
- stochastic shortest path
- infinite horizon
- dynamic programming
- linear programming
- partially observable
- reinforcement learning
- decision theoretic
- decision processes
- average reward
- average cost
- model free
- fixed point
- utility function
- decision problems
- heuristic search
- markov chain
- state transitions
- markov decision process
- finite state
- long run
- state transition
- least squares
- decision making
- action space
- reinforcement learning algorithms
- np hard
- linear program
- hidden markov models
- reward function
- temporal difference
- transition probabilities
- real valued
- lower bound
- state variables