Experiments with Infinite-Horizon, Policy-Gradient Estimation.
Jonathan BaxterPeter L. BartlettLex WeaverPublished in: J. Artif. Intell. Res. (2001)
Keyphrases
- infinite horizon
- gradient estimation
- optimal policy
- finite horizon
- long run
- optimal control
- markov decision process
- stochastic demand
- dynamic programming
- production planning
- markov decision processes
- average cost
- variance reduction
- partially observable
- single item
- state space
- policy iteration
- fixed cost
- decision problems
- periodic review
- markov decision problems
- reinforcement learning
- inventory control
- holding cost
- lost sales
- inventory level
- machine learning
- partially observable markov decision processes
- objective function
- actor critic
- random variables
- inventory policy
- ordering cost
- lead time