Login / Signup
Complete Policy Regret Bounds for Tallying Bandits.
Dhruv Malik
Yuanzhi Li
Aarti Singh
Published in:
COLT (2022)
Keyphrases
</>
regret bounds
multi armed bandit
lower bound
online learning
optimal policy
linear regression
reinforcement learning