Complete Policy Regret Bounds for Tallying Bandits.

Dhruv Malik Yuanzhi Li Aarti Singh

Published in: CoRR (2022)

Keyphrases

regret bounds
multi armed bandit
lower bound
online learning
linear regression
upper bound
feature selection
bayesian networks
information theoretic
online convex optimization