Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards.
Hao QinKwang-Sung JunChicheng ZhangPublished in: CoRR (2023)
Keyphrases
- multi armed bandits
- kullback leibler
- multi armed bandit
- bandit problems
- kl divergence
- cross entropy
- distance measure
- kullback leibler divergence
- random sampling
- reinforcement learning
- sample size
- mutual information
- log likelihood
- density estimation
- decision problems
- information theoretic
- distance metric
- monte carlo
- maximum likelihood