Login / Signup
Thompson Sampling with Time-Varying Reward for Contextual Bandits.
Cairong Yan
Hualu Xu
Haixia Han
Yanting Zhang
Zijian Wang
Published in:
DASFAA (2) (2023)
Keyphrases
</>
multi armed bandit
contextual information
multi armed bandits
reinforcement learning
sample size
neural network
data mining
monte carlo
random sampling
stochastic systems
real world
genetic algorithm
lower bound
sampling algorithm
sampling strategies
bandit problems