Classification of XSLT-Generated Web Documents with Support Vector Machines.
Atakan KurtM. Engin TozalPublished in: KDXD (2006)
Keyphrases
- web documents
- document classification
- html documents
- semi structured
- information extraction
- web pages
- web search engines
- machine learning
- image classification
- keywords
- automatic classification
- text classification
- classification algorithm
- focused crawling
- feature selection
- web content
- text mining
- structured documents
- textual information
- vector space model
- information retrieval
- web data
- xml documents
- database
- active learning
- dynamically generated
- unstructured documents