Language Contamination Explains the Cross-lingual Capabilities of English Pretrained Models.
Terra BlevinsLuke ZettlemoyerPublished in: CoRR (2022)
Keyphrases
- cross lingual
- parallel corpus
- european languages
- machine translation
- language specific
- translation model
- cross language
- source language
- machine translation system
- natural language
- indian languages
- target language
- language independent
- linguistic resources
- cross lingual information retrieval
- word alignment
- mono lingual
- bilingual dictionaries
- text classification
- probabilistic model
- machine learning
- monolingual retrieval
- parallel corpora
- machine learning algorithms
- query translation
- news articles
- comparable corpora
- language modeling
- cross language retrieval