Crowdsourcing an OCR Gold Standard for a German and French Heritage Corpus.
Simon ClematideLenz FurrerMartin VolkPublished in: LREC (2016)
Keyphrases
- gold standard
- mechanical turk
- semi automatic
- ground truth
- optical character recognition
- document images
- manual segmentation
- character recognition
- manually annotated
- post processing
- preprocessing
- digital libraries
- text recognition
- augmented reality
- document processing
- monolingual retrieval
- cultural heritage
- recognition errors
- cross language
- registration accuracy
- cross lingual
- user feedback
- machine vision
- language model
- handwriting recognition
- open domain
- information retrieval