Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach.
R. Israel Ortega-GutiérrezRaúl Montes-de-OcaEnrique Lemus-RodríguezPublished in: Kybernetika (2016)
Keyphrases
- markov decision processes
- optimal policy
- variational principle
- infinite horizon
- finite state
- state space
- average reward
- finite horizon
- sufficient conditions
- dynamic programming
- policy iteration
- decision problems
- reinforcement learning
- long run
- average cost
- markov decision process
- multistage
- state dependent
- semi markov decision processes
- reward function
- reinforcement learning algorithms
- discounted reward
- discount factor
- total reward
- learning algorithm
- cost function
- image segmentation