Login / Signup

Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level.

Jie LiuZhanhui ZhouJiaheng LiuXingyuan BuChao YangHansen ZhongWanli Ouyang
Published in: CoRR (2024)
Keyphrases