Convex synthesis of optimal policies for Markov Decision Processes with sequentially-observed transitions.
Mahmoud El ChamieBehçet AçikmesePublished in: ACC (2016)
Keyphrases
- markov decision processes
- optimal policy
- finite state
- state space
- decision problems
- dynamic programming
- infinite horizon
- policy iteration
- average reward
- long run
- finite horizon
- reinforcement learning
- average cost
- multistage
- sufficient conditions
- state dependent
- control policies
- decision processes
- action space
- reinforcement learning algorithms
- markov decision process
- partially observable markov decision processes
- partially observable
- machine learning
- policy evaluation
- semi markov decision processes
- reward function
- markov chain