Asymptotic Performance of Thompson Sampling for Batched Multi-Armed Bandits.

Cem Kalkanli Ayfer Özgür

Published in: IEEE Trans. Inf. Theory (2023)

Keyphrases

multi armed bandits
multi armed bandit
bandit problems
random sampling
machine learning
monte carlo
lower bound
dynamic programming
upper bound
markov chain
sample size