Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk.

Tianrui Chen Aditya Gangrade Venkatesh Saligrama

Published in: CoRR (2022)

Keyphrases

multi armed bandits
bandit problems
multi armed bandit problems
multi armed bandit
regret bounds
worst case
decision problems
machine learning
decision making
online learning
expected utility
loss function
information theoretic
learning theory
optimal strategy