Login / Signup

It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF.

Taiming LuLingfeng ShenXinyu YangWeiting TanBeidi ChenHuaxiu Yao
Published in: CoRR (2024)
Keyphrases