Login / Signup
Online convex optimization in the bandit setting: gradient descent without a gradient
Abraham Flaxman
Adam Tauman Kalai
H. Brendan McMahan
Published in:
CoRR (2004)
Keyphrases
</>
regret bounds
online convex optimization
online learning
multi armed bandit
linear regression
lower bound
objective function
upper bound
image processing
cost function
reinforcement learning
loss function
long run