Infinite-Horizon Policy-Gradient Estimation.
Jonathan BaxterPeter L. BartlettPublished in: J. Artif. Intell. Res. (2001)
Keyphrases
- infinite horizon
- gradient estimation
- optimal policy
- finite horizon
- stochastic demand
- long run
- dynamic programming
- markov decision processes
- optimal control
- markov decision process
- production planning
- variance reduction
- single item
- average cost
- partially observable
- fixed cost
- holding cost
- lead time
- policy iteration
- inventory policy
- state space
- periodic review
- optimal production
- reinforcement learning
- inventory control
- expected cost
- markov decision problems
- actor critic