Online convex optimization in the bandit setting: gradient descent without a gradient

Abraham Flaxman Adam Tauman Kalai H. Brendan McMahan

Published in: CoRR (2004)

Keyphrases

regret bounds
online convex optimization
online learning
multi armed bandit
linear regression
lower bound
objective function
upper bound
image processing
cost function
reinforcement learning
loss function
long run