Near-optimality for infinite-horizon restless bandits with many arms.
Xiangyu ZhangPeter I. FrazierPublished in: CoRR (2022)
Keyphrases
- infinite horizon
- optimal control
- multi armed bandits
- average cost
- multi armed bandit problems
- finite horizon
- dynamic programming
- holding cost
- production planning
- bandit problems
- long run
- stochastic demand
- control strategy
- markov decision problems
- markov decision processes
- optimal policy
- single item
- partially observable
- markov decision process
- semi markov
- reinforcement learning
- dec pomdps
- multi armed bandit
- policy iteration
- inventory models
- average reward
- stochastic systems
- fixed cost
- lost sales
- stationary policies
- planning horizon
- inventory policy
- linear programming
- decision theoretic