On The Evolution of Clusters of Near-Duplicate Web Pages.
Dennis FetterlyMark S. ManasseMarc NajorkPublished in: J. Web Eng. (2004)
Keyphrases
- web pages
- search engine
- website
- clustering algorithm
- hierarchical structure
- data objects
- keywords
- web search
- data records
- web documents
- web search engines
- hierarchical clustering
- web content
- cluster analysis
- web content mining
- web page classification
- data clustering
- web server
- fuzzy clustering
- web data
- web resources
- link analysis
- bag of words
- web information extraction
- related web pages
- document clustering
- web users
- link structure
- video search
- data sets