The non-Bayesian restless multi-armed bandit: A case of near-logarithmic regret.

Wenhan Dai Yi Gai Bhaskar Krishnamachari Qing Zhao

Published in: ICASSP (2011)

Keyphrases

multi armed bandit
regret bounds
multi armed bandits
reinforcement learning
lower bound
online learning
worst case
maximum likelihood
optimal control
decentralized decision making
e learning
upper bound
loss function
bandit problems