Login / Signup
Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards.
Vrettos Moulos
Published in:
NeurIPS (2020)
Keyphrases
</>
round robin
kullback leibler
reinforcement learning
probabilistic model