An English-translated parallel corpus for the CJK Wikipedia collections.
Ling-Xiang TangShlomo GevaAndrew TrotmanPublished in: ADCS (2012)
Keyphrases
- parallel corpus
- machine translation system
- document collections
- target language
- source language
- sentence pairs
- cross lingual
- machine translation
- cross language information retrieval
- cross language
- statistical machine translation
- query translation
- word alignment
- language independent
- parallel corpora
- document clustering
- information retrieval
- knowledge base
- metadata
- information retrieval systems
- translation model
- wikipedia articles
- information extraction
- foreign language
- document retrieval
- semantic information
- test collection
- named entities
- digital libraries
- natural language