Login / Signup
On learning Whittle index policy for restless bandits with scalable regret.
Nima Akbarzadeh
Aditya Mahajan
Published in:
CoRR (2022)
Keyphrases
</>
online learning
learning systems
learning tasks
supervised learning
knowledge acquisition
learning algorithm
data structure
reinforcement learning
learning process
upper bound
worst case
learning experience
multi armed bandits