Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards.
Alexandre RaméGuillaume CouaironCorentin DancetteJean-Baptiste GayaMustafa ShukorLaure SoulierMatthieu CordPublished in: NeurIPS (2023)
Keyphrases
- pareto optimal
- fine tuned
- multi objective
- fine tuning
- multi objective optimization
- multiple objectives
- nash equilibrium
- multi issue negotiation
- domain specific
- pareto optimality
- optimal solution
- markov decision processes
- nsga ii
- pareto optimal set
- pareto optimal solutions
- expected utility
- social welfare
- reinforcement learning
- linear combination
- multiobjective optimization
- interpolation method
- game theory
- optimization algorithm
- genetic programming
- optimization problems
- general purpose
- evolutionary algorithm