Low-Resource Corpus Filtering Using Multilingual Sentence Embeddings.
Vishrav ChaudharyYuqing TangFrancisco GuzmánHolger SchwenkPhilipp KoehnPublished in: WMT (3) (2019)
Keyphrases
- parallel corpus
- sentence level
- linguistic features
- training corpus
- cross lingual
- sentence pairs
- resource allocation
- cross language information retrieval
- text corpus
- recognizing textual entailment
- vector space
- manifold learning
- noun phrases
- machine translation system
- comparable corpora
- semantic roles
- digital libraries
- part of speech
- language independent
- information filtering
- parallel corpora
- text retrieval
- word frequency
- text generation
- machine translation
- natural language
- manually annotated
- text corpora
- predicate argument