Expected Regret and Pseudo-Regret are Equivalent When the Optimal Arm is Unique.
Daron AndersonDouglas J. LeithPublished in: J. Mach. Learn. Res. (2022)
Keyphrases
- total reward
- multi armed bandit problems
- worst case
- online learning
- minimax regret
- regret bounds
- confidence bounds
- lower bound
- reinforcement learning
- markov decision processes
- optimal policy
- loss function
- regret minimization
- optimal design
- average reward
- expert advice
- action selection
- dynamic programming
- game theory
- optimal solution
- weighted majority
- bandit problems
- online convex optimization
- decision makers
- online algorithms
- infinite horizon
- binary classification
- upper confidence bound
- machine learning