Keyphrases
- web documents
- semi structured
- content similarity
- web pages
- clustering method
- clustering algorithm
- information extraction
- prefetching
- document classification
- k means
- keywords
- web data
- web search engines
- link structure
- vector space model
- textual information
- structured documents
- html documents
- geographic information
- web content
- search engine
- document representation
- focused crawling
- tree structured patterns
- returned by a search engine