Complete Policy Regret Bounds for Tallying Bandits.

Dhruv Malik Yuanzhi Li Aarti Singh

Published in: COLT (2022)

Keyphrases

regret bounds
multi armed bandit
lower bound
online learning
optimal policy
linear regression
reinforcement learning