Classification of Documents Based on the Structure of Their DOM Trees.
Peter GeibelOlga PustylnikovAlexander MehlerHelmar GustKai-Uwe KühnbergerPublished in: ICONIP (2) (2007)
Keyphrases
- tree structure
- document classification
- decision trees
- xml documents
- pattern recognition
- pre classified
- information retrieval
- tree structures
- classification algorithm
- document collections
- classification accuracy
- text classification
- automatic classification
- machine learning
- automatic categorization
- document categorization
- information retrieval systems
- structured information
- website
- web documents
- feature vectors
- feature extraction
- content and structure
- support vector
- support vector machine svm
- class labels
- retrieval systems
- classification method
- document retrieval
- metadata
- web pages
- text categorization
- co occurrence
- multi class
- support vector machine
- feature space