Keyphrases
- web documents
- document classification
- semi structured
- web pages
- information extraction
- keywords
- html documents
- document representation
- vector space model
- structured documents
- web content
- link structure
- web search engines
- textual information
- focused crawling
- web data
- content similarity
- unstructured documents
- tree structured patterns