Zyda: A 1.3T Dataset for Open Language Modeling.
Yury TokpanovBeren MillidgePaolo GloriosoJonathan PilaultAdam IbrahimJames WhittingtonQuentin AnthonyPublished in: CoRR (2024)
Keyphrases
- language modeling
- language model
- information retrieval
- retrieval model
- query expansion
- probabilistic model
- n gram
- cross lingual
- text classification
- improvements in retrieval effectiveness
- document length
- statistical language models
- document retrieval
- co occurrence
- translation model
- relevance model
- pseudo feedback
- database systems