Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem.

Nadav Merlis Shie Mannor

Published in: COLT (2019)

Keyphrases

batch size
multi armed bandit
regret bounds
batch mode
single item
batch processing
poisson process
reinforcement learning
lower bound
finite horizon
active learning
supervised learning