Login / Signup

Stabilizing RLHF through Advantage Model and Selective Rehearsal.

Baolin PengLinfeng SongYe TianLifeng JinHaitao MiDong Yu
Published in: CoRR (2023)
Keyphrases