On the Evolution of Clusters of Near-Duplicate Web Pages.
Dennis FetterlyMark S. ManasseMarc NajorkPublished in: LA-WEB (2003)
Keyphrases
- web pages
- clustering algorithm
- website
- search engine
- hierarchical clustering
- hierarchical structure
- web search
- web content
- web search engines
- web page classification
- data records
- keywords
- web documents
- data points
- data clustering
- information retrieval
- data objects
- cluster analysis
- web server
- video clips
- link analysis
- fuzzy clustering
- cohesive subgroups
- web content mining
- clustering quality
- arbitrary shape
- web resources
- fuzzy c means
- web mining
- information extraction