HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain Reinforcement Learning From AI Feedback.
Ang LiQiugen XiaoPeng CaoJian TangYi YuanZijie ZhaoXiaoyuan ChenLiang ZhangXiangyang LiKaitong YangWeidong GuoYukang GanXu YuDaniell WangYing ShanPublished in: CoRR (2024)