Filtering and rescoring the CCMatrix corpus for Neural Machine Translation training.
Antoni Oliver GonzálezSergi AlvarezPublished in: EAMT (2023)
Keyphrases
- machine translation
- statistical machine translation
- training corpus
- machine translation system
- parallel corpus
- parallel corpora
- chinese english
- natural language processing
- cross lingual
- language independent
- pos tagging
- language processing
- natural language generation
- cross language information retrieval
- information extraction
- word alignment
- natural language
- comparable corpora
- word sense disambiguation
- translation model
- target language
- word level
- query translation
- language resources
- information retrieval
- expert systems
- mono lingual
- bilingual lexicon
- machine transliteration