Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes.
Yichun HuNathan KallusXiaojie MaoPublished in: CoRR (2019)
Keyphrases
- regret bounds
- loss function
- multi armed bandit problems
- multi armed bandits
- multi armed bandit
- contextual information
- lower bound
- bandit problems
- online learning
- objective function
- expert advice
- context sensitive
- stochastic systems
- worst case
- machine learning
- upper bound
- regret minimization
- confidence bounds
- digital divide
- parametric models
- context aware
- pairwise
- image sequences