Crowdsourcing the OCR Ground Truth of a German and French Cultural Heritage Corpus.
Simon ClematideLenz FurrerMartin VolkPublished in: J. Lang. Technol. Comput. Linguistics (2018)
Keyphrases
- cultural heritage
- ground truth
- optical character recognition
- multimedia
- digital libraries
- ground truth data
- document images
- character recognition
- preprocessing
- high quality
- post processing
- error correction
- virtual museum
- knowledge society
- monolingual retrieval
- cidoc crm
- cross language
- recognition errors
- digital collections
- digital objects
- gold standard
- digital archives
- text recognition
- databases
- web based technologies
- mono lingual
- metadata schemas
- handwriting recognition