Bandit Algorithms Based on Thompson Sampling for Bounded Reward Distributions.

Charles Riou Junya Honda

Published in: ALT (2020)

Keyphrases

multi armed bandit
exponential distributions
optimization problems
times faster
significant improvement
probability distribution
worst case
bandit problems
data structure
random sampling
sampling algorithm
learning algorithm
active learning
computational cost
state space
data mining
theoretical analysis
orders of magnitude
inverse reinforcement learning
objective function