β-DPO: Direct Preference Optimization with Dynamic β.

Published in: CoRR (2024)

Keyphrases