Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation.
Nikolay BogoychevRico SennrichPublished in: CoRR (2019)
Keyphrases
- synthetic data
- machine translation
- cross lingual
- information extraction
- language independent
- real world
- real image data
- statistical machine translation
- language processing
- natural language processing
- cross language information retrieval
- target language
- word alignment
- data sets
- natural language generation
- language resources
- word sense disambiguation
- natural language
- brazilian portuguese
- chinese english
- multilingual documents
- domain specific
- synthetic datasets
- machine translation system
- parallel corpora
- machine learning
- finite state transducers