Denumerable controlled Markov chains with average reward criterion: Sample path optimality.
Rolando Cavazos-CadenaEmmanuel Fernández-GaucherandPublished in: Math. Methods Oper. Res. (1995)
Keyphrases
- sample path
- average reward
- markov chain
- optimality criterion
- steady state
- optimal policy
- markov decision processes
- markov model
- finite state
- long run
- monte carlo
- stationary distribution
- state space
- markov process
- policy iteration
- transition probabilities
- stochastic process
- model free
- state dependent
- bayesian networks
- reinforcement learning
- importance sampling