Sign in

Direct Preference Optimization: Your Language Model is Secretly a Reward Model.

Rafael RafailovArchit SharmaEric MitchellStefano ErmonChristopher D. ManningChelsea Finn
Published in: CoRR (2023)
Keyphrases
  • language model
  • probabilistic model
  • statistical model
  • search engine
  • probability distribution
  • web search
  • language modelling