Robust Preference Optimization through Reward Model Distillation.

Adam Fisch Jacob Eisenstein Vicky Zayats Alekh Agarwal Ahmad Beirami Chirag Nagpal Petet Shaw Jonathan Berant

Published in: CoRR (2024)

Keyphrases