Expected Regret and Pseudo-Regret are Equivalent When the Optimal Arm is Unique.

Daron Anderson Douglas J. Leith

Published in: J. Mach. Learn. Res. (2022)

Keyphrases

total reward
multi armed bandit problems
worst case
online learning
minimax regret
regret bounds
confidence bounds
lower bound
reinforcement learning
markov decision processes
optimal policy
loss function
regret minimization
optimal design
average reward
expert advice
action selection
dynamic programming
game theory
optimal solution
weighted majority
bandit problems
online convex optimization
decision makers
online algorithms
infinite horizon
binary classification
upper confidence bound
machine learning