Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes.

Yichun Hu Nathan Kallus Xiaojie Mao

Published in: COLT (2020)

Keyphrases

regret bounds
loss function
multi armed bandit problems
multi armed bandits
multi armed bandit
contextual information
objective function
lower bound
online learning
expert advice
context sensitive
bandit problems
worst case
smooth surfaces
linear regression
stochastic systems
information technology
support vector
neural network
context dependent
smoothness constraint
contextual knowledge
computational complexity
confidence bounds
reinforcement learning