Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles.
Zhiwei TangDmitry RybinTsung-Hui ChangPublished in: ICLR (2024)
Keyphrases
- online learning
- directly optimize
- learning systems
- human learning
- active learning
- learning algorithm
- supervised learning
- motor skills
- unsupervised learning
- optimization algorithm
- learning tasks
- learning to rank
- ranking algorithm
- feedback mechanisms
- preference learning
- user feedback
- ranking functions
- global optimization
- human experts
- knowledge acquisition
- learning process
- training data