OCR of historical printings with an application to building diachronic corpora: A case study using the RIDGES herbal corpus.
Uwe SpringmannAnke LüdelingPublished in: Digit. Humanit. Q. (2017)
Keyphrases
- text corpora
- text data
- document images
- case study
- wide coverage
- topic segmentation
- natural language processing
- historical data
- news corpus
- specific domains
- optical character recognition
- post processing
- knowledge extraction
- character recognition
- document processing
- annotated corpus
- text recognition
- machine learning
- statistical machine translation
- training corpus
- knowledge discovery
- preprocessing
- text corpus
- data analysis
- document corpus
- scale space
- text classification
- gray level