Experiments with Infinite-Horizon, Policy-Gradient Estimation
Peter L. BartlettJonathan BaxterLex WeaverPublished in: CoRR (2011)
Keyphrases
- infinite horizon
- gradient estimation
- optimal policy
- finite horizon
- stochastic demand
- markov decision process
- long run
- variance reduction
- optimal control
- partially observable
- production planning
- markov decision processes
- dynamic programming
- single item
- average cost
- finite state
- markov decision problems
- state space
- fixed cost
- policy iteration
- actor critic
- single product
- holding cost
- state dependent
- lost sales
- lead time
- decision problems
- periodic review
- reinforcement learning
- inventory policy
- optimal production
- ordering cost
- lot size
- machine learning
- inventory control
- linear program
- optimal solution
- learning algorithm