A dynamic programming strategy to balance exploration and exploitation in the bandit problem.
Olivier CaelenGianluca BontempiPublished in: Ann. Math. Artif. Intell. (2010)
Keyphrases
- dynamic programming
- exploration exploitation
- exploration strategy
- stereo matching
- search strategies
- reinforcement learning
- active learning
- optimal strategy
- search strategy
- coarse to fine
- exploration exploitation tradeoff
- wireless sensor networks
- selection strategy
- greedy algorithm
- optimal control
- machine learning
- pairwise
- computer vision
- information retrieval