Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback.
Chicheng ZhangAlekh AgarwalHal Daumé IIIJohn LangfordSahand N. NegahbanPublished in: CoRR (2019)
Keyphrases
- multi armed bandit
- regret bounds
- learning algorithm
- supervised learning
- unsupervised learning
- relevance feedback
- machine learning
- contextual information
- semi supervised
- random sampling
- reinforcement learning
- case study
- social networks
- user feedback
- context dependent
- information retrieval
- database
- multi armed bandits