Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings.
Mikel ArtetxeHolger SchwenkPublished in: CoRR (2018)
Keyphrases
- parallel corpus
- cross lingual
- cross language information retrieval
- language independent
- machine translation
- machine translation system
- sentence pairs
- word alignment
- query translation
- language modeling
- knowledge discovery
- cross language
- text mining
- target language
- data mining
- machine learning
- statistical machine translation
- latent semantic analysis
- dimensionality reduction
- low dimensional
- source language
- parallel corpora
- search engine
- news articles
- cross lingual information retrieval
- vector space