Login / Signup
RLHF Workflow: From Reward Modeling to Online RLHF.
Hanze Dong
Wei Xiong
Bo Pang
Haoxiang Wang
Han Zhao
Yingbo Zhou
Nan Jiang
Doyen Sahoo
Caiming Xiong
Tong Zhang
Published in:
CoRR (2024)
Keyphrases
</>
real time
online learning
reinforcement learning
search engine
social networks
information systems
website
reward function
modeling method
document management