Login / Signup
Tight Policy Regret Bounds for Improving and Decaying Bandits.
Hoda Heidari
Michael J. Kearns
Aaron Roth
Published in:
IJCAI (2016)
Keyphrases
</>
regret bounds
lower bound
upper bound
multi armed bandit
linear regression
online learning
optimal solution
worst case
reinforcement learning
optimal policy
feature selection