Login / Signup

DPO Meets PPO: Reinforced Token Optimization for RLHF.

Han ZhongGuhao FengWei XiongLi ZhaoDi HeJiang BianLiwei Wang
Published in: CoRR (2024)
Keyphrases