MC^2: A Multilingual Corpus of Minority Languages in China.
Chen ZhangMingxu TaoQuzhe HuangJiuheng LinZhibin ChenYansong FengPublished in: CoRR (2023)
Keyphrases
- cross lingual
- language independent
- comparable corpora
- multi lingual
- parallel corpus
- multilingual information retrieval
- parallel corpora
- statistical machine translation
- machine translation system
- multilingual documents
- cross language information retrieval
- language specific
- sentence pairs
- machine translation
- language resources
- cross lingual information retrieval
- chinese english
- bilingual dictionaries
- language modeling
- linguistic resources
- text classification
- hong kong
- news articles
- word pairs
- natural language
- wide coverage
- training corpus
- multiword
- target language
- query translation
- n gram
- test set
- translation model
- manually annotated
- indian languages
- class distribution
- wikipedia articles
- labor intensive
- text collections
- information access
- expressive power
- databases