Login / Signup

WPO: Enhancing RLHF with Weighted Preference Optimization.

Wenxuan ZhouRavi AgrawalShujian ZhangSathish Reddy IndurthiSanqiang ZhaoKaiqiang SongSilei XuChenguang Zhu
Published in: CoRR (2024)
Keyphrases