PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation.
Long DoanLinh The NguyenNguyen Luong TranThai HoangDat Quoc NguyenPublished in: CoRR (2021)
Keyphrases
- machine translation
- benchmark datasets
- cross lingual
- information extraction
- natural language processing
- language independent
- cross language information retrieval
- brazilian portuguese
- target language
- statistical machine translation
- word sense disambiguation
- natural language generation
- language processing
- chinese english
- word alignment
- natural language
- parallel corpora
- machine translation system
- word level
- language specific
- named entity recognition
- language resources
- mt evaluation
- parallel corpus
- query translation
- information retrieval
- n gram
- cross lingual information retrieval
- english chinese
- machine learning