Stochastic Multi-Armed Bandits with Control Variates.

Arun Verma Manjesh Kumar Hanawal

Published in: CoRR (2021)

Keyphrases

multi armed bandits
learning algorithm
optimal solution
control system
least squares
sufficient conditions
monte carlo
optimal control
multi armed bandit