Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment.

Published in: CoRR (2023)

Keyphrases