No Discounted-Regret Learning in Adversarial Bandits with Delays.

Ilai Bistritz Zhengyuan Zhou Xi Chen Nicholas Bambos Jose H. Blanchet

Published in: CoRR (2021)

Keyphrases

online learning
reinforcement learning
learning process
learning algorithm
active learning
data sets
multi agent
prior knowledge
supervised learning
knowledge acquisition
unsupervised learning
learning problems
multi armed bandits