Nearly Optimal Policy Optimization with Stable at Any Time Guarantee.
Tianhao WuYunchang YangHan ZhongLiwei WangSimon S. DuJiantao JiaoPublished in: CoRR (2021)
Keyphrases
- optimal policy
- decision problems
- long run
- state space
- infinite horizon
- reinforcement learning
- dynamic programming
- state dependent
- markov decision processes
- finite horizon
- multistage
- average reward
- bayesian reinforcement learning
- sufficient conditions
- markov decision process
- control policies
- average cost
- finite state
- stochastic demand
- periodic review
- decision making
- lost sales
- serial inventory systems
- inventory level
- stochastic optimization
- policy iteration
- partially observable
- production system
- cost function