LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation.
Yongjing YinJiali ZengYafu LiFandong MengYue ZhangPublished in: CoRR (2024)
Keyphrases
- machine translation
- data collection
- machine readable dictionaries
- bilingual dictionaries
- parallel corpus
- language processing
- information extraction
- language independent
- data analysis
- cross language information retrieval
- cross lingual
- natural language processing
- language resources
- target language
- natural language generation
- statistical machine translation
- word sense disambiguation
- brazilian portuguese
- chinese english
- word alignment
- natural language
- query translation
- co occurrence
- source language
- machine translation system
- text categorization
- finite state transducers