Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards.
Alexandre RaméGuillaume CouaironMustafa ShukorCorentin DancetteJean-Baptiste GayaLaure SoulierMatthieu CordPublished in: CoRR (2023)
Keyphrases
- pareto optimal
- fine tuned
- multi objective
- fine tuning
- multiple objectives
- multi objective optimization
- pareto optimal set
- expected utility
- reinforcement learning
- multi issue negotiation
- domain specific
- pareto optimality
- nsga ii
- nash equilibrium
- markov decision processes
- evolutionary algorithm
- optimal solution
- multiobjective optimization
- genetic algorithm
- objective function
- optimal policy
- optimization algorithm
- pareto optimal solutions
- decision making