Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret.
Ofer DekelAmbuj TewariRaman AroraPublished in: ICML (2012)
Keyphrases
- online learning
- lower bound
- prior knowledge
- bandit problems
- learning algorithm
- e learning
- reinforcement learning
- learning community
- learning systems
- knowledge acquisition
- multi armed bandit problems
- real time
- adaptive learning
- worst case
- markov chain
- learning problems
- active learning
- learning process
- regret bounds
- machine learning
- confidence bounds
- neural network