Segmentation of Mixed Chinese/English Document Including Scattered Italic Characters.
Yong XiaChun-Heng WangRuwei DaiPublished in: ICCPOL (2006)
Keyphrases
- chinese english
- machine translation
- wordnet
- cross language information retrieval
- linguistic resources
- text lines
- text documents
- statistical machine translation
- information retrieval systems
- text collections
- document images
- retrieval systems
- keywords
- tf idf
- query terms
- cross lingual
- relevant documents
- text categorization
- word level
- out of vocabulary
- information retrieval
- document analysis
- co occurrence
- translation model
- natural language processing
- machine translation system
- parallel corpora
- natural language