Login / Signup

LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks.

Anna BavarescoRaffaella BernardiLeonardo BertolazziDesmond ElliottRaquel FernándezAlbert GattEsam GhalebMario GiulianelliMichael HannaAlexander KollerAndré F. T. MartinsPhilipp MondorfVera NeplenbroekSandro PezzelleBarbara PlankDavid SchlangenAlessandro SugliaAditya K. SurikuchiEce TakmazAlberto Testoni
Published in: CoRR (2024)
Keyphrases