Login / Signup
Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning.
Ahmadreza Moradipari
Mohammad Pedramfar
Modjtaba Shokrian Zini
Vaneet Aggarwal
Published in:
NeurIPS (2023)
Keyphrases
</>
reinforcement learning
multi armed bandit
bayesian networks
regret bounds
optimal policy
monte carlo
model free
learning algorithm
state space
maximum likelihood
active learning
multi class
nearest neighbor
sample size
learning problems
posterior probability