Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization.
Swaroop NathTejpalsingh SiledarSankara Sri Raghava Ravindra MudduRupasai RangarajuHarshad KhadilkarPushpak BhattacharyyaSuman BanerjeeAmey PatilSudhanshu Shekhar SinghMuthusamy ChelliahNikesh GareraPublished in: CoRR (2024)