A Novel Weighted Phrase-Based Similarity for Web Documents Clustering.
Ruilong YangQingsheng ZhuYunni XiaPublished in: J. Softw. (2011)
Keyphrases
- web documents
- content similarity
- information extraction
- web pages
- semi structured
- web search engines
- clustering algorithm
- document classification
- similarity function
- clustering method
- keywords
- web logs
- k means
- machine translation
- vector space model
- similarity measure
- structured documents
- focused crawling
- returned by a search engine
- distance measure
- web content
- document representation
- similarity metric
- data points
- database systems
- html documents
- document similarity
- document clustering
- text classification
- link structure
- website
- information retrieval
- machine learning