How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?
Shiyue ZhangVishrav ChaudharyNaman GoyalJames CrossGuillaume WenzekMohit BansalFrancisco GuzmánPublished in: CoRR (2022)
Keyphrases
- machine translation
- language specific
- language resources
- multilingual documents
- target language
- cross lingual
- parallel corpus
- machine translation system
- language independent
- cross language information retrieval
- language processing
- natural language
- comparable corpora
- chinese english
- bilingual dictionaries
- natural language processing
- source language
- parallel corpora
- information extraction
- statistical machine translation
- word alignment
- multilingual information retrieval
- query translation
- linguistic resources
- word sense disambiguation
- digital libraries
- bilingual lexicon
- natural language generation
- cross lingual information retrieval
- machine learning
- machine transliteration
- multilingual retrieval