Login / Signup

Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards.

Alexandre RaméGuillaume CouaironMustafa ShukorCorentin DancetteJean-Baptiste GayaLaure SoulierMatthieu Cord
Published in: CoRR (2023)
Keyphrases