SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading.
Tu Anh DinhCarlos MullovLeonard BärmannZhaolin LiDanni LiuSimon ReißJueun LeeNathan LerzerFabian TërnavaJianfeng GaoAlexander WaibelTamim AsfourMichael BeiglRainer StiefelhagenCarsten DachsbacherKlemens BöhmJan NiehuesPublished in: CoRR (2024)
Keyphrases
- language model
- human experts
- language modeling
- n gram
- probabilistic model
- document retrieval
- language modelling
- context sensitive
- speech recognition
- retrieval model
- information retrieval
- statistical language models
- query expansion
- post processing
- smoothing methods
- language models for information retrieval
- document ranking
- relevance model
- pseudo relevance feedback
- artificial intelligence
- vector space model
- classification rules
- translation model
- test collection
- support vector machine
- feature extraction
- data mining