Login / Signup
Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits.
Vasilis Syrgkanis
Haipeng Luo
Akshay Krishnamurthy
Robert E. Schapire
Published in:
CoRR (2016)
Keyphrases
</>
regret bounds
multi armed bandit
lower bound
online learning
linear regression
learning algorithm
reinforcement learning
bregman divergences