WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia.
Holger SchwenkVishrav ChaudharyShuo SunHongyu GongFrancisco GuzmánPublished in: CoRR (2019)
Keyphrases
- natural language
- intended meaning
- target language
- programming language
- pairwise
- mining algorithm
- language learning
- knowledge discovery
- text generation
- parallel processing
- parallel implementation
- text mining
- pattern mining
- wikipedia articles
- source language
- semantic representations
- syntactic parsing
- hedge detection
- web mining
- named entities
- link structure
- language processing
- itemsets
- natural language text
- text corpus
- natural language processing
- probabilistic context free grammars