On learning Whittle index policy for restless bandits with scalable regret.

Nima Akbarzadeh Aditya Mahajan

Published in: CoRR (2022)

Keyphrases

online learning
learning systems
learning tasks
supervised learning
knowledge acquisition
learning algorithm
data structure
reinforcement learning
learning process
upper bound
worst case
learning experience
multi armed bandits