Effective Parallel Corpus Mining using Bilingual Sentence Embeddings.
Mandy GuoQinlan ShenYinfei YangHeming GeDaniel CerGustavo Hernández ÁbregoKeith StevensNoah ConstantYun-Hsuan SungBrian StropeRay KurzweilPublished in: CoRR (2018)
Keyphrases
- parallel corpus
- cross lingual
- sentence pairs
- cross language information retrieval
- language independent
- word alignment
- machine translation
- target language
- statistical machine translation
- query translation
- lexical knowledge
- source language
- parallel corpora
- machine translation system
- natural language
- parallel texts
- data mining
- vector space
- word pairs
- latent semantic analysis
- cross language
- document clustering
- knowledge discovery