Bandit Algorithms Based on Thompson Sampling for Bounded Reward Distributions.
Charles RiouJunya HondaPublished in: ALT (2020)
Keyphrases
- multi armed bandit
- exponential distributions
- optimization problems
- times faster
- significant improvement
- probability distribution
- worst case
- bandit problems
- data structure
- random sampling
- sampling algorithm
- learning algorithm
- active learning
- computational cost
- state space
- data mining
- theoretical analysis
- orders of magnitude
- inverse reinforcement learning
- objective function