Optimization of Average Rewards of Time Nonhomogeneous Markov Chains.
Xi-Ren CaoPublished in: IEEE Trans. Autom. Control. (2015)
Keyphrases
- markov chain
- steady state
- finite state
- transition probabilities
- monte carlo
- state space
- random walk
- stochastic process
- markov decision processes
- monte carlo method
- markov process
- monte carlo simulation
- markov model
- confidence intervals
- stationary distribution
- reinforcement learning
- markov processes
- average cost
- probabilistic automata
- random numbers
- transition matrix
- model selection
- algo rithm