Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation.
Qiang LiuLihong LiZiyang TangDengyong ZhouPublished in: NeurIPS (2018)
Keyphrases
- infinite horizon
- finite horizon
- optimal control
- long run
- optimal policy
- dynamic programming
- markov decision processes
- stochastic demand
- partially observable
- production planning
- single item
- average cost
- fixed cost
- inventory policy
- machine learning
- lost sales
- multi agent
- markov decision process
- lead time
- production system
- markov chain
- probability distribution
- inventory models
- reinforcement learning