Login / Signup
Finite-time Analysis of Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards.
Vrettos Moulos
Published in:
CoRR (2020)
Keyphrases
</>
kullback leibler
confidence bounds
reinforcement learning
distance measure
statistical analysis
probability distribution