M2D2: A Massively Multi-Domain Language Modeling Dataset.
Machel ReidVictor ZhongSuchin GururanganLuke ZettlemoyerPublished in: EMNLP (2022)
Keyphrases
- language modeling
- multi domain
- language model
- retrieval model
- information retrieval
- cross domain
- query expansion
- domain specific
- n gram
- probabilistic model
- text classification
- heterogeneous networks
- data analysis
- data sets
- relevance model
- document retrieval
- prior knowledge
- test collection
- document collections
- digital libraries
- learning algorithm
- co occurrence
- general purpose
- nearest neighbor