Clustering Web Documents Based on Correlation of Hyperlinks.
Kou TakahashiTakao MiuraIsamu ShioyaPublished in: ICDE Workshops (2005)
Keyphrases
- web documents
- web pages
- content similarity
- semi structured
- clustering algorithm
- clustering method
- information extraction
- html documents
- document classification
- vector space model
- k means
- keywords
- web search engines
- web content
- prefetching
- focused crawling
- document clustering
- web data
- data points
- document representation
- databases
- returned by a search engine
- topic specific
- link structure
- web logs
- textual information
- website
- information retrieval