Login / Signup
Logarithmic weak regret of non-Bayesian restless multi-armed bandit.
Haoyang Liu
Keqin Liu
Qing Zhao
Published in:
ICASSP (2011)
Keyphrases
</>
machine learning
multi armed bandit
regret bounds
reinforcement learning
multi armed bandits
lower bound
online learning
linear regression
optimal control
upper bound
worst case
decentralized decision making
learning algorithm
model selection
active learning
maximum likelihood
dynamic programming
bayesian networks