Publication: Whittle index based Q-learning for restless bandits with average reward.