Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion.
Yannis Flet-BerliacNathan GrinsztajnFlorian StrubEugene ChoiChris CremerArash AhmadianYash ChandakMohammad Gheshlaghi AzarOlivier PietquinMatthieu GeistPublished in: CoRR (2024)