A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit.
Long YangZhao LiZehong HuShasha RuanGang PanPublished in: IEEE Trans. Neural Networks Learn. Syst. (2023)
Keyphrases
- sampling algorithm
- regret bounds
- random sampling
- lower bound
- online learning
- linear regression
- worst case
- active learning
- bandit problems
- multi armed bandit
- upper bound
- maximum likelihood
- chinese restaurant process
- upper confidence bound
- hyperparameters
- bregman divergences
- sample size
- least squares
- reservoir sampling
- multi armed bandit problems
- markov chain monte carlo
- gaussian distribution
- markov chain
- training set
- machine learning