Title extraction and generation from OCR'd documents.
Kazem TaghvaAllen ConditSteven E. LumosJulie BorsackThomas A. NartkerPublished in: DRR (2007)
Keyphrases
- printed documents
- document processing
- document analysis
- scanned documents
- optical character recognition
- keywords
- page layout
- ocr systems
- document collections
- document images
- text extraction
- information retrieval
- document clustering
- post processing
- document classification
- information extraction
- web documents
- text documents
- xml documents
- information retrieval systems
- document retrieval
- free text
- character recognition
- vector space model
- automatic extraction
- digital documents
- metadata
- generation method
- retrieval systems
- query biased
- digital libraries
- text recognition
- user queries
- text retrieval
- document image analysis
- document structure
- textual documents
- text lines
- vector space
- ranked list
- knowledge extraction
- text categorization
- legal documents
- word spotting
- scanned images
- natural language processing
- document type
- text analysis
- structured documents