Automatic Semantic Text Tagging on Historical Lexica by Combining OCR and Typography Classification: A Case Study on Daniel Sander's Wörterbuch der Deutschen Sprache.
Christian ReulSebastian GöttelUwe SpringmannChristoph WickKay-Michael WürznerFrank PuppePublished in: DATeCH (2019)
Keyphrases
- preprocessing
- supervised machine learning
- pattern recognition
- classification accuracy
- image classification
- support vector machine svm
- printed documents
- machine learning
- classification algorithm
- class labels
- support vector machine
- text recognition
- support vector
- decision trees
- feature selection
- character recognition
- classification models
- text retrieval
- classification method
- historical manuscripts
- text classification
- text mining
- supervised learning
- feature vectors
- digital libraries
- keywords
- post processing
- classification scheme
- optical character recognition
- document analysis
- training set
- scanned documents
- text extraction