Sample path sharing in simulation-based policy improvement.
Di WuQing-Shan JiaChun-Hung ChenPublished in: ICRA (2014)
Keyphrases
- sample path
- policy iteration
- asymptotic analysis
- average reward
- fluid model
- markov chain
- serial inventory systems
- markov decision processes
- optimal policy
- lost sales
- model free
- fixed point
- large deviations
- policy evaluation
- reinforcement learning
- steady state
- least squares
- dynamic programming
- markov decision process
- long run
- finite state