Knapsack based Optimal Policies for Budget-Limited Multi-Armed Bandits
Long Tran-ThanhArchie C. ChapmanAlex RogersNicholas R. JenningsPublished in: CoRR (2012)
Keyphrases
- optimal policy
- multi armed bandits
- dynamic programming
- markov decision processes
- decision problems
- bandit problems
- finite horizon
- state space
- infinite horizon
- finite state
- long run
- reinforcement learning
- multistage
- markov decision problems
- average cost
- dynamic programming algorithms
- sufficient conditions
- knapsack problem
- average reward
- serial inventory systems
- average reward reinforcement learning
- multi armed bandit
- policy iteration
- lost sales
- markov decision process
- initial state
- optimal control
- reward function
- influence diagrams