Reducing OCR Errors in Gothic-Script Documents.
Lenz FurrerMartin VolkPublished in: ERCIM News (2011)
Keyphrases
- document processing
- printed documents
- scanned documents
- recognition errors
- document images
- information retrieval
- optical character recognition
- document analysis
- indian languages
- page layout
- post processing
- document collections
- ocr systems
- document classification
- document image retrieval
- xml documents
- text documents
- legal documents
- document clustering
- character recognition
- scanned images
- metadata
- arabic documents
- information retrieval systems
- vector space model
- textual documents
- preprocessing
- web documents
- word spotting
- document representation
- digital documents
- document retrieval
- text retrieval
- database
- keywords
- retrieval systems
- multi document summarization
- electronic documents
- text analysis
- text lines
- structured documents