Catwalk: A Unified Language Model Evaluation Framework for Many Datasets.
Dirk GroeneveldAnas AwadallaIz BeltagyAkshita BhagiaIan MagnussonHao PengOyvind TafjordPete WalshKyle RichardsonJesse DodgePublished in: CoRR (2023)
Keyphrases
- document collections
- language model
- evaluation framework
- information retrieval
- document retrieval
- test collection
- evaluation methodology
- language modeling
- query terms
- evaluation process
- retrieval model
- query expansion
- relevant documents
- n gram
- benchmark datasets
- mixture model
- vector space model
- probabilistic model
- ad hoc information retrieval
- evaluation metrics
- evaluation measures
- smoothing methods
- semantic annotation
- translation model
- evaluation criteria
- training data