Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits.
Long Tran-ThanhArchie C. ChapmanAlex RogersNicholas R. JenningsPublished in: AAAI (2012)
Keyphrases
- lost sales
- optimal policy
- multi armed bandits
- dynamic programming
- decision problems
- bandit problems
- finite horizon
- reinforcement learning
- markov decision processes
- infinite horizon
- state space
- long run
- knapsack problem
- multistage
- finite state
- serial inventory systems
- policy iteration
- multi armed bandit
- average cost
- sufficient conditions
- markov decision process
- average reward reinforcement learning
- initial state
- partially observable markov decision processes
- average reward
- dynamic programming algorithms
- influence diagrams
- fixed point
- objective function