Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards.

Published in: NeurIPS (2023)

Keyphrases