Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space.
Johannes MüllerGuido MontúfarPublished in: CoRR (2022)
Keyphrases
- policy iteration algorithm
- infinite horizon
- optimal policy
- markov decision processes
- finite state
- markov decision problems
- reinforcement learning
- policy iteration
- partially observable markov decision processes
- long run
- optimal control
- partially observable
- finite horizon
- dynamic programming
- markov decision process
- state space
- stochastic demand
- periodic review
- average cost
- decision problems
- demand distributions
- lead time
- dec pomdps
- control policies
- inventory level
- lost sales
- state dependent
- learning algorithm
- reinforcement learning algorithms
- linear programming