Refining Word Segmentation Using a Manually Aligned Corpus for Statistical Machine Translation.
Xiaolin WangMasao UtiyamaAndrew M. FinchEiichiro SumitaPublished in: EMNLP (2014)
Keyphrases
- word segmentation
- pos tagging
- training corpus
- text classification
- unknown words
- n gram
- handwriting recognition
- word recognition
- chinese word segmentation
- language modeling
- language independent
- chinese text retrieval
- chinese text
- handwritten documents
- sparse data
- document analysis
- cross lingual
- language model
- machine learning
- statistical machine translation
- text categorization
- information extraction
- keywords
- topic tracking
- image processing
- feature selection
- information retrieval