Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization.

Published in: CoRR (2024)

Keyphrases