Extreme point characterization of constrained nonstationary infinite-horizon Markov decision processes with finite state space.
Ilbin LeeMarina A. EpelmanH. Edwin RomeijnRobert L. SmithPublished in: Oper. Res. Lett. (2014)
Keyphrases
- markov decision processes
- finite state
- infinite horizon
- finite horizon
- non stationary
- optimal policy
- action space
- average cost
- state space
- inventory control
- reinforcement learning
- dynamic programming
- partially observable
- policy iteration
- action sets
- partially observable markov decision processes
- markov decision process
- search space
- average reward
- continuous state
- dec pomdps
- policy iteration algorithm
- reinforcement learning algorithms
- reward function
- control policies
- multistage
- markov chain
- search algorithm