Login / Signup
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions.
Kei Takemura
Shinji Ito
Daisuke Hatano
Hanna Sumita
Takuro Fukunaga
Naonori Kakimura
Ken-ichi Kawarabayashi
Published in:
CoRR (2021)
Keyphrases
</>
regret bounds
lower bound
online learning
linear regression
payoff functions
multi armed bandit
upper bound
reinforcement learning
nearest neighbor
bregman divergences
online convex optimization