Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem.

Junpei Komiyama Junya Honda Hisashi Kashima Hiroshi Nakagawa

Published in: CoRR (2015)

Keyphrases

worst case
lower bound
optimal solution
regret bounds
competitive ratio
dynamic programming
np hard
computational complexity
k means
upper bound
learning algorithm
detection algorithm
search space
probabilistic model
preprocessing
objective function
optimal cost
lower and upper bounds
globally optimal
multi armed bandit
genetic algorithm
online algorithms
random sampling
approximation algorithms
linear programming
exhaustive search
linear regression
constant factor
branch and bound
markov decision processes