The Sliding Regret in Stochastic Bandits: Discriminating Index and Randomized Policies.

Published in: CoRR (2023)

Keyphrases

multi armed bandit problems
regret bounds
multi armed bandit
bandit problems
lower bound
stochastic systems
multi armed bandits
control policies
online learning
expert advice
optimal policy
linear regression
stochastic models
upper bound
index structure
loss function
monte carlo
reward function
total reward
reinforcement learning
sliding window
base stock policies