Convergence of value iterations for total-cost MDPs and POMDPs with general state and action sets.
Eugene A. FeinbergPavlo O. KasyanovMichael Z. ZgurovskyPublished in: ADPRL (2014)
Keyphrases
- total cost
- action sets
- markov decision processes
- state space
- reinforcement learning
- average cost
- partially observable
- finite state
- stationary policies
- optimal solution
- belief state
- special case
- action space
- markov decision process
- dynamic programming
- minimum total cost
- policy iteration
- planning under uncertainty
- markov decision problems
- machine learning
- production cost
- state variables
- inventory level
- state transitions
- search space
- learning algorithm