Splitting Randomized Stationary Policies in Total-Reward Markov Decision Processes.
Eugene A. FeinbergUriel G. RothblumPublished in: Math. Oper. Res. (2012)
Keyphrases
- stationary policies
- markov decision processes
- total reward
- action sets
- optimal policy
- finite state
- state space
- dynamic programming
- markov decision process
- reinforcement learning
- average cost
- policy iteration
- planning under uncertainty
- linear program
- average reward
- reinforcement learning algorithms
- partially observable
- decision processes
- lot sizing
- reward function
- initial state
- action space
- machine learning
- dynamical systems
- sufficient conditions
- probability distribution