Unsupervised Extraction of Training Data for Pre-Modern Chinese OCR.
Donald SturgeonPublished in: FLAIRS Conference (2017)
Keyphrases
- training data
- supervised learning
- optical character recognition
- text extraction
- post processing
- training set
- decision trees
- unsupervised learning
- data sets
- training dataset
- automatic extraction
- classification accuracy
- domain knowledge
- preprocessing
- learning algorithm
- information extraction
- domain adaptation
- test set
- training process
- document images
- test data
- text recognition
- error correction
- classification models
- class labels
- semi supervised
- prior knowledge
- key phrase extraction
- training examples
- scanned documents
- chinese text
- handwriting recognition
- text summarization
- data driven
- active learning
- pairwise
- neural network