An Optimization Methodology for Document Structure Extraction on Latin Character Documents.
Jisheng LiangIhsin T. PhillipsRobert M. HaralickPublished in: IEEE Trans. Pattern Anal. Mach. Intell. (2001)
Keyphrases
- structure extraction
- document structure
- text lines
- optical character recognition
- document layout
- printed documents
- document representation
- document images
- document collections
- structured documents
- relevant documents
- inex book track
- web documents
- information retrieval systems
- xml documents
- text documents
- document analysis
- information retrieval
- retrieval systems
- document retrieval
- electronic documents
- document clustering
- text summarization
- semantic information
- document type
- keywords
- handwritten documents
- query terms
- character recognition
- hierarchical structures
- relational databases
- vector space model
- user queries
- retrieved documents
- query expansion
- bag of words
- digital libraries
- ranked list
- web pages
- database