M2D2: A Massively Multi-domain Language Modeling Dataset.
Machel ReidVictor ZhongSuchin GururanganLuke ZettlemoyerPublished in: CoRR (2022)
Keyphrases
- language modeling
- multi domain
- language model
- retrieval model
- information retrieval
- query expansion
- cross domain
- domain specific
- n gram
- probabilistic model
- text classification
- relevance model
- heterogeneous networks
- digital libraries
- nearest neighbor
- general purpose
- information retrieval systems
- machine learning
- training data