Topic-Independent Web High-Quality Page Selection Based on K-Means Clustering.
Canhui WangYiqun LiuMin ZhangShaoping MaPublished in: AIRS (2005)
Keyphrases
- website
- high quality
- web pages
- topic distillation
- focused crawler
- topic specific
- focused crawling
- web applications
- web communities
- web documents
- web users
- relevant pages
- web technologies
- web browsing
- web mining
- linked data
- web data
- keywords
- web crawler
- information sources
- page content
- classifying web pages
- home page
- web graph
- k means
- data extraction
- web queries
- user generated content
- link analysis
- spectral clustering
- semantic web
- web log mining
- navigational behavior
- clustering algorithm