Combinations and Mixtures of Optimal Policies in Unichain Markov Decision Processes are Optimal
Ronald OrtnerPublished in: CoRR (2005)
Keyphrases
- markov decision processes
- average cost
- optimal policy
- finite state
- stationary policies
- dynamic programming
- finite horizon
- average reward
- state space
- initial state
- long run
- infinite horizon
- policy iteration
- reinforcement learning
- markov decision process
- multistage
- state dependent
- total reward
- discounted reward
- partially observable
- decision problems
- decision processes
- finite number
- optimal control
- semi markov decision processes
- state and action spaces
- control policies
- linear programming
- markov decision problems
- sufficient conditions
- control policy
- reinforcement learning algorithms
- optimality criterion
- decision diagrams