Login / Signup
The non-Bayesian restless multi-armed bandit: A case of near-logarithmic regret.
Wenhan Dai
Yi Gai
Bhaskar Krishnamachari
Qing Zhao
Published in:
ICASSP (2011)
Keyphrases
</>
multi armed bandit
regret bounds
multi armed bandits
reinforcement learning
lower bound
online learning
worst case
maximum likelihood
optimal control
decentralized decision making
e learning
upper bound
loss function
bandit problems