Keyphrases
- web documents
- text categorization
- web pages
- semi structured
- document classification
- web search engines
- vector space model
- focused crawling
- document representation
- information extraction
- automatic classification
- textual information
- web data
- html documents
- keywords
- structured documents
- machine learning
- content similarity
- language model
- active learning
- search engine
- information retrieval