Keyphrases
- web documents
- content similarity
- web pages
- information extraction
- semi structured
- keywords
- html documents
- natural language
- web search engines
- similarity measure
- semantic similarity
- textual information
- document classification
- distance measure
- similarity metric
- web directories
- focused crawling
- dynamically generated
- web content
- web data
- data representation
- link structure
- structured documents
- distance function