Regret Distribution in Stochastic Bandits: Optimal Trade-off between Expectation and Tail Risk.
David Simchi-LeviZeyu ZhengFeng ZhuPublished in: CoRR (2023)
Keyphrases
- regret bounds
- trade off
- multi armed bandit
- worst case
- online learning
- expected loss
- lower bound
- power law
- multi armed bandits
- upper bound
- conditional expectation
- heavy tailed
- stochastic systems
- expected error
- expert advice
- random variables
- dynamic programming
- risk factors
- decision making
- linear regression
- least squares
- reinforcement learning