Login / Signup

Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization.

Audrey HuangWenhao ZhanTengyang XieJason D. LeeWen SunAkshay KrishnamurthyDylan J. Foster
Published in: CoRR (2024)
Keyphrases
  • chi squared
  • information gain
  • optimization algorithm
  • optimization problems
  • global optimization
  • user preferences
  • information retrieval
  • learning algorithm
  • kullback leibler