Login / Signup

SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading.

Tu Anh DinhCarlos MullovLeonard BärmannZhaolin LiDanni LiuSimon ReißJueun LeeNathan LerzerFabian TërnavaJianfeng GaoAlexander WaibelTamim AsfourMichael BeiglRainer StiefelhagenCarsten DachsbacherKlemens BöhmJan Niehues
Published in: CoRR (2024)
Keyphrases