Measuring Mass Text Digitization Quality and Usefulness: Lessons Learned from Assessing the OCR Accuracy of the British Library's 19th Century Online Newspaper Archive.
Simon TannerTrevor MuñozPich Hemy RosPublished in: D Lib Mag. (2009)
Keyphrases
- lessons learned
- case study
- future directions
- text recognition
- high accuracy
- printed documents
- text extraction
- document images
- real time
- participatory design
- optical character recognition
- character recognition
- post processing
- computational cost
- high quality
- information retrieval
- text retrieval
- online learning
- st century
- document analysis
- metadata
- database
- web documents
- text documents
- free text
- text mining
- grounded theory
- sufficiently accurate