Keyphrases
- web documents
- content similarity
- information extraction
- semi structured
- document classification
- web pages
- web search engines
- clustering algorithm
- web content
- clustering method
- keywords
- html documents
- web data
- k means
- document clustering
- prefetching
- link structure
- focused crawling
- data points
- document representation
- data mining
- returned by a search engine
- structured documents
- query language
- learning algorithm