Regret Distribution in Stochastic Bandits: Optimal Trade-off between Expectation and Tail Risk.

David Simchi-Levi Zeyu Zheng Feng Zhu

Published in: CoRR (2023)

Keyphrases

regret bounds
trade off
multi armed bandit
worst case
online learning
expected loss
lower bound
power law
multi armed bandits
upper bound
conditional expectation
heavy tailed
stochastic systems
expected error
expert advice
random variables
dynamic programming
risk factors
decision making
linear regression
least squares
reinforcement learning