Improving OCR Quality in 19th Century Historical Documents Using a Combined Machine Learning Based Approach.
David FleischhackerWolfgang GöderleRoman KernPublished in: CoRR (2024)
Keyphrases
- historical documents
- machine learning
- document images
- handwriting recognition
- optical character recognition
- pattern recognition
- neural network
- character recognition
- feature selection
- information extraction
- natural language processing
- historical manuscripts
- text classification
- word recognition
- handwritten document images
- printed documents
- document processing
- poor quality
- data quality
- text mining
- image analysis