Bounding reward measures of Markov models using the Markov decision processes.
Peter BuchholzPublished in: Numer. Linear Algebra Appl. (2011)
Keyphrases
- markov models
- markov decision processes
- average reward
- reinforcement learning
- reward function
- markov model
- transition probabilities
- maximum entropy
- total reward
- expected reward
- discounted reward
- hidden state
- higher order
- optimal policy
- state space
- hidden markov models
- finite state
- dynamic programming
- policy iteration
- reinforcement learning algorithms
- partially observable
- conditional random fields
- stationary policies
- decision theoretic planning
- long run
- finite horizon
- transition matrices
- infinite horizon
- markov chain
- state and action spaces
- markov decision process
- upper bound
- average cost
- action space
- machine learning
- temporal difference
- model checking
- special case
- bayesian networks
- learning algorithm