Optimistic Whittle Index Policy: Online Learning for Restless Bandits.
Kai WangLily XuAparna TanejaMilind TambePublished in: AAAI (2023)
Keyphrases
- online learning
- semi markov
- e learning
- distance education
- regret bounds
- online course
- optimal control
- distance learning
- higher education
- optimal policy
- blended learning
- computer mediated
- indexing method
- database
- active learning
- b tree
- asymptotically optimal
- multi armed bandit
- indexing techniques
- stochastic systems
- conservation laws
- online learning environments
- policy makers
- markov decision processes
- data sets