RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback.

Harrison LeeSamrat PhataleHassan MansoorKellie LuThomas MesnardColton BishopVictor CarbuneAbhinav Rastogi
Published in: CoRR (2023)
Keyphrases