RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs.
Shreyas ChaudhariPranjal AggarwalVishvak MurahariTanmay RajpurohitAshwin KalyanKarthik NarasimhanAmeet DeshpandeBruno Castro da SilvaPublished in: CoRR (2024)