Login / Signup

Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment.

Tianhao WuBanghua ZhuRuoyu ZhangZhaojin WenKannan RamchandranJiantao Jiao
Published in: CoRR (2023)
Keyphrases