A dynamic programming strategy to balance exploration and exploitation in the bandit problem.

Olivier Caelen Gianluca Bontempi

Published in: Ann. Math. Artif. Intell. (2010)

Keyphrases

dynamic programming
exploration exploitation
exploration strategy
stereo matching
search strategies
reinforcement learning
active learning
optimal strategy
search strategy
coarse to fine
exploration exploitation tradeoff
wireless sensor networks
selection strategy
greedy algorithm
optimal control
machine learning
pairwise
computer vision
information retrieval