Optimal Policies for a Class of Restless Multiarmed Bandit Scheduling Problems with Applications to Sensor Management.
Robert B. WashburnMichael K. SchneiderPublished in: J. Adv. Inf. Fusion (2008)
Keyphrases
- optimal policy
- scheduling problem
- markov decision processes
- multiarmed bandit
- decision problems
- reinforcement learning
- finite horizon
- long run
- finite state
- infinite horizon
- multistage
- dynamic programming
- state space
- optimal control
- dynamic programming algorithms
- learning algorithm
- average cost
- policy iteration
- average reward
- average reward reinforcement learning
- initial state
- finite number
- semi markov decision processes
- multi agent
- serial inventory systems