Optical Character Recognition, Word Segmentation, Sentence Segmentation, and Information Extraction for Historical and Literature Texts in Classical Chinese.
Chao-Lin LiuPublished in: ROCLING (2020)
Keyphrases
- word segmentation
- handwritten document images
- optical character recognition
- information extraction
- handwriting recognition
- historical manuscripts
- word level
- natural language
- ocr systems
- historical documents
- word recognition
- text segmentation
- chinese word segmentation
- chinese text
- text summarization
- n gram
- character recognition
- cross lingual
- document analysis
- language independent
- text documents
- handwritten documents
- text classification
- machine translation
- information retrieval
- document images
- question answering
- named entity recognition
- text mining
- pos tagging
- natural language processing
- word spotting
- chinese text retrieval
- page segmentation
- part of speech
- language modeling
- sparse data
- text processing
- word sense disambiguation