Login / Signup
Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem.
Masrour Zoghi
Shimon Whiteson
Rémi Munos
Maarten de Rijke
Published in:
CoRR (2013)
Keyphrases
</>
upper confidence bound
contextual bandit
reinforcement learning
markov chain