Nearly Optimal Policy Optimization with Stable at Any Time Guarantee.
Tianhao WuYunchang YangHan ZhongLiwei WangSimon S. DuJiantao JiaoPublished in: ICML (2022)
Keyphrases
- optimal policy
- markov decision processes
- finite horizon
- reinforcement learning
- state space
- decision problems
- infinite horizon
- dynamic programming
- long run
- state dependent
- multistage
- finite state
- sufficient conditions
- lost sales
- markov decision process
- policy iteration
- average reward
- serial inventory systems
- average cost
- bayesian reinforcement learning
- stochastic optimization
- inventory level
- partially observable markov decision processes
- inventory control
- graphical models