Building and Improving an OCR Classifier for Republican Chinese Newspaper Text.
Matthias ArnoldKonstantin HenkePublished in: DHd (2022)
Keyphrases
- chinese text
- text recognition
- optical character recognition
- printed documents
- text summarization
- document analysis
- document processing
- ocr systems
- text extraction
- training data
- text retrieval
- information retrieval
- text mining
- keyword extraction
- scanned documents
- document images
- english text
- classification algorithm
- classification method
- support vector machine
- character recognition
- text data
- feature space
- text classifiers
- feature selection
- text lines
- page layout
- chinese texts
- decision trees
- text documents
- web documents
- training samples
- training set
- writing style
- lexical features
- preprocessing
- keywords
- learning algorithm
- text information
- text regions
- error correction
- svm classifier
- natural language processing