An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem.

Arpit Agarwal Rohan Ghuge Viswanath Nagarajan

Published in: NeurIPS (2022)

Keyphrases

asymptotically optimal
dynamic programming
computational complexity
optimal solution
probabilistic model
worst case
learning algorithm
objective function
bayesian networks
search space
np hard
simulated annealing