Multichain Markov Decision Processes with a Sample Path Constraint: A Decomposition Approach.
Keith W. RossRavi VaradarajanPublished in: Math. Oper. Res. (1991)
Keyphrases
- sample path
- markov decision processes
- average reward
- policy iteration
- optimal policy
- state space
- finite state
- reinforcement learning
- dynamic programming
- long run
- partially observable
- average cost
- finite horizon
- infinite horizon
- markov decision process
- asymptotic analysis
- decision problems
- markov decision problems
- least squares
- model free
- markov chain
- policy evaluation