Thompson Sampling with Time-Varying Reward for Contextual Bandits.

Cairong Yan Hualu Xu Haixia Han Yanting Zhang Zijian Wang

Published in: DASFAA (2) (2023)

Keyphrases

multi armed bandit
contextual information
multi armed bandits
reinforcement learning
sample size
neural network
data mining
monte carlo
random sampling
stochastic systems
real world
genetic algorithm
lower bound
sampling algorithm
sampling strategies
bandit problems