Poor Man's OCR Post-Correction: Unsupervised Recognition of Variant Spelling Applied to a Multilingual Document Collection.
Harald HammarströmShafqat Mumtaz VirkMarkus ForsbergPublished in: DATeCH (2017)
Keyphrases
- document collections
- digital libraries
- information retrieval systems
- character recognition
- document retrieval
- cross language
- information retrieval
- text retrieval
- database
- multimedia
- feature extraction
- error correction
- document image analysis
- document archives
- document summaries
- result lists
- ad hoc retrieval
- optical character recognition
- keywords
- machine learning
- databases