Loss bounds for uncertain transition probabilities in Markov decision processes.
Andrew MastinPatrick JailletPublished in: CDC (2012)
Keyphrases
- markov decision processes
- transition probabilities
- markov chain
- state space
- random walk
- reward function
- markov models
- finite state
- temporal difference learning
- dynamic programming
- markov decision process
- policy iteration
- reinforcement learning
- optimal policy
- partially observable
- reinforcement learning algorithms
- markov decision problems
- average cost
- link structure
- infinite horizon
- monte carlo
- worst case
- markov model
- maximum entropy
- fixed point
- higher order
- action space
- probabilistic model
- machine learning