Near-optimal Per-Action Regret Bounds for Sleeping Bandits.

Quan Nguyen Nishant A. Mehta

Published in: CoRR (2024)

Keyphrases

regret bounds
multi armed bandit
lower bound
online learning
linear regression
upper bound
reinforcement learning
probabilistic model
least squares
linear predictors
optimal solution
nearest neighbor
data dependent
bregman divergences