Login / Signup
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization.
Audrey Huang
Wenhao Zhan
Tengyang Xie
Jason D. Lee
Wen Sun
Akshay Krishnamurthy
Dylan J. Foster
Published in:
CoRR (2024)
Keyphrases
</>
chi squared
information gain
optimization algorithm
optimization problems
global optimization
user preferences
information retrieval
learning algorithm
kullback leibler