Secrets of RLHF in Large Language Models Part I: PPO.
Rui ZhengShihan DouSongyang GaoYuan HuaWei ShenBinghai WangYan LiuSenjie JinQin LiuYuhao ZhouLimao XiongLu ChenZhiheng XiNuo XuWenbin LaiMinghao ZhuCheng ChangZhangyue YinRongxiang WengWensen ChengHaoran HuangTianxiang SunHang YanTao GuiQi ZhangXipeng QiuXuanjing HuangPublished in: CoRR (2023)
Keyphrases
- language model
- language modeling
- n gram
- document retrieval
- probabilistic model
- information retrieval
- retrieval model
- test collection
- speech recognition
- query expansion
- language modelling
- context sensitive
- vector space model
- statistical language models
- smoothing methods
- query terms
- ad hoc information retrieval
- document length
- document ranking
- query specific
- translation model
- passage retrieval
- relevance model
- error rate