Using XPaths of inbound links to cluster template-generated web pages.
Tomas GrigalisAntanas CenysPublished in: Comput. Sci. Inf. Syst. (2014)
Keyphrases
- web pages
- link analysis
- link structure
- search engine
- website
- dynamically generated
- hierarchical structure
- template matching
- clustering algorithm
- web page classification
- data extraction
- keywords
- web search
- hyperlink structure
- web server
- web data
- web information extraction
- data records
- data objects
- automatically generated
- data clustering
- web search engines
- web documents
- web graph
- data sets
- anchor text
- web communities
- web content
- web mining
- content similarity
- web snippets