Do Thesauri enhance rule-based categorization for OCR text?
Kazem TaghvaJeffrey S. CoombsPublished in: DRR (2003)
Keyphrases
- text recognition
- optical character recognition
- printed documents
- document processing
- text extraction
- document categorization
- ocr systems
- document images
- document analysis
- information retrieval
- text mining
- data driven
- scanned documents
- printed text
- domain specific
- expert systems
- error correction
- text information
- text categorization
- post processing
- page layout
- keywords
- free text
- domain dependent
- text analysis
- text processing
- terminology extraction
- complex background
- character recognition
- text documents
- semantic information
- database
- search engine
- hidden markov models
- automatic categorization
- natural language processing
- information retrieval systems
- semantic web
- text retrieval
- text data
- textual information
- text lines
- textual data
- string matching