Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation.
Rosa María Flores-HernándezPublished in: Kybernetika (2013)
Keyphrases
- markov decision processes
- optimal policy
- transition probabilities
- state space
- markov chain
- markov decision process
- stationary policies
- finite state
- reward function
- average reward
- infinite horizon
- finite horizon
- random walk
- reinforcement learning
- markov models
- total reward
- state dependent
- action space
- dynamic programming
- policy iteration
- long run
- partially observable
- average cost
- markov decision problems
- discount factor
- decision problems
- multistage
- real time dynamic programming
- discounted reward
- initial state
- markov model
- decision processes
- state variables
- sufficient conditions
- state abstraction
- belief state
- inventory level
- reinforcement learning algorithms
- hidden markov models
- search space