Login / Signup
Bandit Learning with Joint Effect of Incentivized Sampling, Delayed Sampling Feedback, and Self-Reinforcing User Preferences.
Tianchen Zhou
Jia Liu
Chaosheng Dong
Yi Sun
Published in:
ICLR (2022)
Keyphrases
</>
user preferences
random sampling
hierarchical task networks
learning algorithm
supervised learning
learning process
reinforcement learning
active learning
collaborative filtering
online learning
user behavior
learning tasks
user feedback
recommendation systems
user profiles
user behaviour
preference learning