On Stationary Strategies in Countable State Total Reward Markov Decision Processes.
Jan van der WalPublished in: Math. Oper. Res. (1984)
Keyphrases
- markov decision processes
- total reward
- stationary policies
- state space
- reinforcement learning
- optimal policy
- average reward
- markov decision process
- finite state
- reinforcement learning algorithms
- average cost
- partially observable
- dynamic programming
- infinite horizon
- lot sizing
- action space
- policy iteration
- linear program
- sufficient conditions
- decision processes
- action selection
- markov chain
- decision theoretic planning
- real time dynamic programming
- learning algorithm
- heuristic search
- planning under uncertainty
- transition matrices
- initial state
- markov decision problems
- search algorithm