A Counterexample on Sample-Path Optimality in Stable Markov Decision Chains with the Average Reward Criterion.
Rolando Cavazos-CadenaRaúl Montes-de-OcaKarel SladkýPublished in: J. Optim. Theory Appl. (2014)
Keyphrases
- average reward
- sample path
- markov decision chains
- optimality criterion
- average cost
- markov decision processes
- long run
- optimal policy
- policy iteration
- finite state
- model checking
- reinforcement learning
- finite horizon
- infinite horizon
- model free
- state space
- initial state
- markov chain
- decision problems
- dynamic programming
- multistage
- markov decision process
- markov decision problems
- partially observable
- least squares