WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia.
Holger SchwenkVishrav ChaudharyShuo SunHongyu GongFrancisco GuzmánPublished in: EACL (2021)
Keyphrases
- natural language
- pairwise
- programming language
- intended meaning
- target language
- text corpus
- parallel processing
- data mining
- language learning
- hedge detection
- mining algorithm
- pattern mining
- named entities
- document collections
- text mining
- natural language text
- knowledge base
- syntactic parsing
- text generation
- semantic representations
- computational linguistics
- web mining
- language processing
- semantic relations
- word pairs
- natural language processing
- wordnet
- frequent patterns
- information retrieval