Login / Signup
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards.
Amaury Gouverneur
Borja Rodríguez Gálvez
Tobias J. Oechtering
Mikael Skoglund
Published in:
ISIT (2023)
Keyphrases
</>
multi armed bandit
multi armed bandits
regret bounds
reinforcement learning
online learning
linear regression
lower bound
gaussian mixture model
decision trees
maximum likelihood
upper bound
markov decision processes
gaussian distribution
gaussian mixture
least squares
bandit problems
bayesian networks