Login / Signup
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards.
Amaury Gouverneur
Borja Rodríguez Gálvez
Tobias J. Oechtering
Mikael Skoglund
Published in:
CoRR (2023)
Keyphrases
</>
multi armed bandit
multi armed bandits
regret bounds
reinforcement learning
online learning
lower bound
upper bound
linear regression
markov decision processes
sample size
maximum likelihood
bregman divergences
state space
gaussian mixture model
probability density function
random sampling
gaussian mixture