OCR Alternatives for Electronic Publishing of Digitised Documents.
Stefan PletschacherPublished in: ELPUB (2005)
Keyphrases
- document processing
- printed documents
- scanned documents
- optical character recognition
- document images
- document analysis
- document collections
- information retrieval
- page layout
- information retrieval systems
- document image retrieval
- ocr systems
- document classification
- web documents
- text documents
- post processing
- legal documents
- word spotting
- xml documents
- character recognition
- document retrieval
- keywords
- retrieval systems
- recognition errors
- error correction
- document representation
- text retrieval
- relevant documents
- text lines
- vector space
- preprocessing
- database
- information extraction
- decision makers
- alternative approaches
- vector space model
- free text
- document clustering