Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons.
Arpit AgarwalShivani AgarwalSepehr AssadiSanjeev KhannaPublished in: COLT (2017)
Keyphrases
- multi armed bandits
- preference learning
- pairwise comparisons
- supervised learning
- learning process
- online learning
- learning problems
- pairwise comparison
- learning algorithm
- dynamic programming
- recommender systems
- active learning
- special case
- machine learning
- utility function
- learning tasks
- objective function
- reinforcement learning