Clustering Web Documents Based on Knowledge Granularity.
Faliang HuangShichao ZhangPublished in: APWeb (2006)
Keyphrases
- web documents
- domain knowledge
- semi structured
- web search engines
- information extraction
- web pages
- clustering method
- document classification
- content similarity
- clustering algorithm
- web content
- web logs
- knowledge discovery
- k means
- web data
- textual information
- unstructured text
- unstructured documents
- structured documents
- search engine
- document representation
- vector space model
- background knowledge
- web search
- relational databases
- database systems