Bandit problems and the exploration/exploitation tradeoff.

William G. Macready David H. Wolpert

Published in: IEEE Trans. Evol. Comput. (1998)

Keyphrases

bandit problems
exploration exploitation tradeoff
objective function
relevance feedback
multi armed bandits
reinforcement learning
function approximation
decision problems
active learning
optimal solution
influence diagrams
decision makers
expected utility
markov chain
data mining
artificial neural networks
artificial intelligence
information retrieval