The Nordic Pile: A 1.2TB Nordic Dataset for Language Modeling.
Joey ÖhmanSeverine VerlindenAriel EkgrenAmaru Cuba GyllenstenTim IsbisterEvangelia GogoulouFredrik CarlssonMagnus SahlgrenPublished in: CoRR (2023)
Keyphrases
- language modeling
- language model
- information retrieval
- retrieval model
- n gram
- query expansion
- cross lingual
- probabilistic model
- statistical language models
- text classification
- relevance model
- document retrieval
- information retrieval systems
- translation model
- trec collections
- sentence retrieval
- pseudo feedback
- test collection
- vector space model