Login / Signup

UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem.

Peter AuerRonald Ortner
Published in: Period. Math. Hung. (2010)
Keyphrases
  • multi armed bandit
  • regret bounds
  • reinforcement learning
  • linear regression
  • lower bound
  • online learning
  • upper bound
  • image sequences
  • training data
  • optimal solution
  • text classification