Using Parallel Corpora to Automatically Generate Training Data for Chinese Segmenters in NTCIR PatentMT Tasks.
Jui-Ping WangChao-Lin LiuPublished in: NTCIR (2013)
Keyphrases
- automatically generate
- english chinese
- parallel corpora
- training data
- cross language information retrieval
- bi directional
- cross lingual information retrieval
- machine translation
- automatically generated
- query translation
- language independent
- chinese english
- machine translation system
- cross lingual
- statistical machine translation
- labor intensive
- word pairs
- learning algorithm
- comparable corpora
- supervised learning
- out of vocabulary
- cross language
- translation model
- word segmentation
- sentence level
- transfer learning
- bilingual dictionaries