Login / Signup

RLHF Workflow: From Reward Modeling to Online RLHF.

Hanze DongWei XiongBo PangHaoxiang WangHan ZhaoYingbo ZhouNan JiangDoyen SahooCaiming XiongTong Zhang
Published in: CoRR (2024)
Keyphrases
  • real time
  • online learning
  • reinforcement learning
  • search engine
  • social networks
  • information systems
  • website
  • reward function
  • modeling method
  • document management