Login / Signup
Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards.
Hao Qin
Kwang-Sung Jun
Chicheng Zhang
Published in:
NeurIPS (2023)
Keyphrases
</>
multi armed bandits
kullback leibler
multi armed bandit
bandit problems
distance measure
kullback leibler divergence
kl divergence
cross entropy
reinforcement learning
monte carlo
sample size
random sampling
decision problems
similarity measure
information theoretic