Sign in

Secrets of RLHF in Large Language Models Part I: PPO.

Rui ZhengShihan DouSongyang GaoYuan HuaWei ShenBinghai WangYan LiuSenjie JinQin LiuYuhao ZhouLimao XiongLu ChenZhiheng XiNuo XuWenbin LaiMinghao ZhuCheng ChangZhangyue YinRongxiang WengWensen ChengHaoran HuangTianxiang SunHang YanTao GuiQi ZhangXipeng QiuXuanjing Huang
Published in: CoRR (2023)
Keyphrases