PERL: Parameter Efficient Reinforcement Learning from Human Feedback.
Hakim SidahmedSamrat PhataleAlex HutchesonZhuonan LinZhang ChenZac YuJarvis JinRoman KomarytsiaChristiane AhlheimYonghao ZhuSimral ChaudharyBowen LiSaravanan GaneshBill ByrneJessica HoffmannHassan MansoorWei LiAbhinav RastogiLucas DixonPublished in: CoRR (2024)