Login / Signup

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion.

Yannis Flet-BerliacNathan GrinsztajnFlorian StrubEugene ChoiChris CremerArash AhmadianYash ChandakMohammad Gheshlaghi AzarOlivier PietquinMatthieu Geist
Published in: CoRR (2024)
Keyphrases
  • policy gradient
  • semi supervised
  • feature selection
  • learning algorithm
  • cost function
  • parametric optimization