Extracting the Latent Hierarchical Structure of Web Documents.
Michael A. El-ShayebSamhaa R. El-BeltagyAhmed A. RafeaPublished in: SITIS (2006)
Keyphrases
- web documents
- hierarchical structure
- web pages
- hierarchically structured
- hierarchical structures
- hierarchical classification
- information extraction
- hierarchical organization
- web search engines
- semi structured
- tree structure
- web directories
- web data
- vector space model
- keywords
- web content
- document representation
- website
- search engine
- image representation
- link structure
- digital libraries
- html documents
- structured documents
- focused crawling
- automatic extraction
- relational databases