The Non-Bayesian Restless Multi-Armed Bandit: a Case of Near-Logarithmic Regret

Wenhan Dai Yi Gai Bhaskar Krishnamachari Qing Zhao

Published in: CoRR (2010)

Keyphrases

multi armed bandit
regret bounds
multi armed bandits
reinforcement learning
worst case
online learning
lower bound
upper bound
maximum likelihood
maximum entropy
decentralized decision making
bayesian networks
optimal solution