The Sliding Regret in Stochastic Bandits: Discriminating Index and Randomized Policies.
Victor BoonePublished in: CoRR (2023)
Keyphrases
- multi armed bandit problems
- regret bounds
- multi armed bandit
- bandit problems
- lower bound
- stochastic systems
- multi armed bandits
- control policies
- online learning
- expert advice
- optimal policy
- linear regression
- stochastic models
- upper bound
- index structure
- loss function
- monte carlo
- reward function
- total reward
- reinforcement learning
- sliding window
- base stock policies