Tight Policy Regret Bounds for Improving and Decaying Bandits.

Hoda Heidari Michael J. Kearns Aaron Roth

Published in: IJCAI (2016)

Keyphrases

regret bounds
lower bound
upper bound
multi armed bandit
linear regression
online learning
optimal solution
worst case
reinforcement learning
optimal policy
feature selection