Correction of whitespace and word segmentation in noisy Pashto text using CRF.
Ijazul HaqWeidong QiuJie GuoPeng TangPublished in: Speech Commun. (2023)
Keyphrases
- word segmentation
- chinese text
- document analysis
- conditional random fields
- word level
- n gram
- chinese word segmentation
- handwritten documents
- word recognition
- information retrieval
- language independent
- pos tagging
- text classification
- handwriting recognition
- text analysis
- information extraction
- chinese text retrieval
- machine learning
- cross lingual
- language modeling
- pairwise
- keywords
- noisy environments
- text documents