How to Improve Optical Character Recognition of Historical Finnish Newspapers Using Open Source Tesseract OCR Engine - Final Notes on Development and Evaluation.
Mika KoistinenKimmo KettunenJukka KervinenPublished in: LCT (2017)
Keyphrases
- optical character recognition
- open source
- character recognition
- text recognition
- document images
- historical manuscripts
- ocr systems
- character segmentation
- handwriting recognition
- printed documents
- historical documents
- scanned documents
- evaluation method
- page segmentation
- word spotting
- handwritten document images
- image binarization
- evaluation methods
- evaluation metrics
- case study
- error correction
- news articles
- video data
- source code
- real time