Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes.
Yichun HuNathan KallusXiaojie MaoPublished in: COLT (2020)
Keyphrases
- regret bounds
- loss function
- multi armed bandit problems
- multi armed bandits
- multi armed bandit
- contextual information
- objective function
- lower bound
- online learning
- expert advice
- context sensitive
- bandit problems
- worst case
- smooth surfaces
- linear regression
- stochastic systems
- information technology
- support vector
- neural network
- context dependent
- smoothness constraint
- contextual knowledge
- computational complexity
- confidence bounds
- reinforcement learning