Login / Signup

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning.

Yuheng ZhangDian YuBaolin PengLinfeng SongYe TianMingyue HuoNan JiangHaitao MiDong Yu
Published in: CoRR (2024)
Keyphrases