Reinforcement Learning Augmented Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits.
Guojun XiongJian LiRahul SinghPublished in: AAAI (2022)
Keyphrases
- optimal policy
- asymptotically optimal
- finite horizon
- reinforcement learning
- markov decision processes
- optimal control
- asymptotic optimality
- infinite horizon
- state space
- decision problems
- dynamic programming
- markov decision process
- control policies
- multistage
- single product
- long run
- heavy traffic
- inventory control
- finite state
- state dependent
- holding cost
- average cost
- initial state
- index structure
- markov decision problems
- sufficient conditions
- single item
- search algorithm
- lost sales
- setup cost
- partially observable markov decision processes
- reward function
- inventory level
- graphical models
- lower bound
- ordering cost