Infinite-Horizon Policy-Gradient Estimation
Peter L. BartlettJonathan BaxterPublished in: CoRR (2011)
Keyphrases
- infinite horizon
- gradient estimation
- optimal policy
- finite horizon
- stochastic demand
- long run
- markov decision processes
- optimal control
- markov decision process
- partially observable
- dynamic programming
- production planning
- variance reduction
- single item
- policy iteration
- holding cost
- reinforcement learning
- lead time
- fixed cost
- lost sales
- optimal production
- average cost
- periodic review
- decision problems
- markov decision problems
- inventory policy
- control system
- machine learning
- expected cost
- state space
- objective function