How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?
Shiyue ZhangVishrav ChaudharyNaman GoyalJames CrossGuillaume WenzekMohit BansalFrancisco GuzmánPublished in: AMTA (2022)
Keyphrases
- machine translation
- language specific
- language resources
- multilingual documents
- target language
- parallel corpus
- cross lingual
- machine translation system
- cross language information retrieval
- language independent
- language processing
- comparable corpora
- chinese english
- natural language
- natural language processing
- source language
- statistical machine translation
- multilingual retrieval
- linguistic resources
- bilingual dictionaries
- word alignment
- information extraction
- parallel corpora
- cross lingual information retrieval
- query translation
- word sense disambiguation
- bilingual lexicon
- natural language generation
- lexical knowledge
- phrase based smt
- word level
- cross language
- finite state transducers
- multilingual information retrieval
- co occurrence