Exploration-exploitation tradeoff using variance estimates in multi-armed bandits.

Jean-Yves Audibert Rémi Munos Csaba Szepesvári

Published in: Theor. Comput. Sci. (2009)

Keyphrases

multi armed bandits
exploration exploitation tradeoff
bandit problems
neural network
feature selection
objective function
computational complexity
state space
knn
upper bound
sufficient conditions
learning tasks
function approximation
multi armed bandit