Image-Based Technique To Select Visually Salient Pages In Large Documents.
Fabrice MatulicPublished in: J. Digit. Inf. Manag. (2009)
Keyphrases
- web documents
- page layout
- textual content
- keywords
- information retrieval
- document classification
- web information
- document collections
- text documents
- web pages
- xml documents
- website
- information retrieval systems
- document retrieval
- metadata
- vector space model
- search engine
- web crawler
- html pages
- document representation
- electronic documents
- wikipedia pages
- www pages
- highly ranked
- hyperlink structure
- focused crawling
- topic modeling
- data objects
- document clustering
- query terms
- vector space
- web mining
- relevant documents
- user queries