OCR of a Mixed Corpus: Early Printings and Manuscripts of Martianus Capella.
Manuel AyusoPublished in: DATeCH (2017)
Keyphrases
- optical character recognition
- historical manuscripts
- post processing
- document images
- text recognition
- character recognition
- cultural heritage
- manually annotated
- data sets
- preprocessing
- test set
- document processing
- supervised machine learning
- recognition errors
- ocr systems
- open domain
- scanned documents
- historical documents
- printed documents
- real world
- spoken dialog
- neural network