OCR of historical printings with an application to building diachronic corpora: A case study using the RIDGES herbal corpus.
Uwe SpringmannAnke LüdelingPublished in: CoRR (2016)
Keyphrases
- text corpora
- optical character recognition
- post processing
- case study
- wide coverage
- multiscale
- preprocessing
- natural language processing
- statistical machine translation
- document corpus
- document images
- test bed
- character recognition
- text corpus
- error correction
- parallel corpus
- training corpus
- knowledge extraction
- sentence pairs
- linguistic patterns
- data mining
- topic segmentation
- annotated corpus
- machine learning