Login / Signup

Secrets of RLHF in Large Language Models Part II: Reward Modeling.

Binghai WangRui ZhengLu ChenYan LiuShihan DouCaishuang HuangWei ShenSenjie JinEnyu ZhouChenyu ShiSongyang GaoNuo XuYuhao ZhouXiaoran FanZhiheng XiJun ZhaoXiao WangTao JiHang YanLixing ShenZhan ChenTao GuiQi ZhangXipeng QiuXuanjing HuangZuxuan WuYu-Gang Jiang
Published in: CoRR (2024)
Keyphrases