An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem.
Ryo WatanabeAtsuyoshi NakamuraMineichi KudoPublished in: Oper. Res. Lett. (2015)
Keyphrases
- upper bound
- bandit problems
- multi armed bandit problems
- lower bound
- multi armed bandit
- regret bounds
- worst case
- decision problems
- matching process
- upper confidence bound
- similarity assessment
- total reward
- matching algorithm
- optimal policy
- lower and upper bounds
- random sampling
- object recognition
- pattern matching
- single item
- keypoints
- expert advice
- image matching
- reinforcement learning algorithms
- expected utility
- objective function
- graph matching
- branch and bound algorithm