Low-Resource Corpus Filtering using Multilingual Sentence Embeddings.
Vishrav ChaudharyYuqing TangFrancisco GuzmánHolger SchwenkPhilipp KoehnPublished in: CoRR (2019)
Keyphrases
- parallel corpus
- sentence level
- sentence pairs
- training corpus
- cross lingual
- linguistic features
- language independent
- text generation
- digital libraries
- recognizing textual entailment
- semantic roles
- dimensionality reduction
- natural language
- noun phrases
- text corpus
- machine translation system
- document level
- word level
- low dimensional
- manually annotated
- resource allocation
- text classification
- relation extraction
- word frequency
- manifold learning
- machine translation
- predicate argument