Login / Signup
Near-optimal Per-Action Regret Bounds for Sleeping Bandits.
Quan Nguyen
Nishant A. Mehta
Published in:
CoRR (2024)
Keyphrases
</>
regret bounds
multi armed bandit
lower bound
online learning
linear regression
upper bound
reinforcement learning
probabilistic model
least squares
linear predictors
optimal solution
nearest neighbor
data dependent
bregman divergences