Login / Signup
Direct Preference Optimization: Your Language Model is Secretly a Reward Model.
Rafael Rafailov
Archit Sharma
Eric Mitchell
Christopher D. Manning
Stefano Ermon
Chelsea Finn
Published in:
NeurIPS (2023)
Keyphrases
</>
language model
probabilistic model
mixture model
maximum likelihood
em algorithm
document retrieval
language modeling
context dependent
relevance model
translation model
language model for information retrieval