Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits.

Aadirupa Saha Shubham Gupta

Published in: ICML (2022)

Keyphrases

non stationary
worst case
adaptive algorithms
learning algorithm
regret bounds
computationally efficient
multi armed bandit
lower bound
online learning
expert advice
dynamic programming
loss function
stock price
confidence bounds