Online EXP3 Learning in Adversarial Bandits with Delayed Feedback.

Ilai Bistritz Zhengyuan Zhou Xi Chen Nicholas Bambos Jose H. Blanchet

Published in: NeurIPS (2019)

Keyphrases

delayed feedback
learning algorithm
online learning
learning process
learning scheme
reinforcement learning
empirical studies
multi agent
upper bound
learning systems
learning problems
online training