Login / Signup

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data.

Fahim TajwarAnikait SinghArchit SharmaRafael RafailovJeff SchneiderTengyang XieStefano ErmonChelsea FinnAviral Kumar
Published in: CoRR (2024)
Keyphrases