Login / Signup

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback.

Hamish IvisonYizhong WangJiacheng LiuZeqiu WuValentina PyatkinNathan LambertNoah A. SmithYejin ChoiHannaneh Hajishirzi
Published in: CoRR (2024)
Keyphrases