Efficient Online Learning with Offline Datasets for Infinite Horizon MDPs: A Bayesian Approach.
Dengwang TangRahul JainBotao HaoZheng WenPublished in: CoRR (2023)
Keyphrases
- infinite horizon
- online learning
- markov decision processes
- finite horizon
- optimal policy
- average cost
- markov decision process
- partially observable
- optimal control
- stochastic demand
- long run
- markov decision problems
- dynamic programming
- production planning
- dec pomdps
- policy iteration
- reinforcement learning
- real time
- state space
- inventory control
- action space
- lot size
- lead time