Structure extraction from PDF-based book documents.
Liangcai GaoZhi TangXiaofan LinYing LiuRuiheng QiuYongtao WangPublished in: JCDL (2011)
Keyphrases
- structure extraction
- inex book track
- document structure
- document layout
- pdf files
- structured documents
- document collections
- information retrieval
- xml documents
- relevant documents
- information retrieval systems
- web documents
- pdf documents
- text documents
- retrieval systems
- document representation
- text summarization
- document retrieval
- document clustering
- semantic information
- text lines
- electronic documents
- focused retrieval
- keywords
- metadata
- query expansion
- natural language processing
- web pages
- database