Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit.

Shintaro Nakamura Masashi Sugiyama

Published in: CoRR (2023)

Keyphrases

real valued
multi armed bandit
multi armed bandits
reinforcement learning
complex valued
integer valued
information retrieval
ranking functions
real valued data
learning algorithm
text mining
sample size
regret bounds