Automated subject classification of textual web documents.
Koraljka GolubPublished in: J. Documentation (2006)
Keyphrases
- web documents
- document classification
- textual information
- keywords
- information extraction
- semi structured
- supervised learning
- web data
- web search engines
- text classification
- machine learning
- automatic classification
- web pages
- vector space model
- feature space
- web content
- visual content
- text categorization
- link structure
- domain specific
- training set
- website
- metadata
- feature selection