Employing Structural and Textual Feature Extraction for Semistructured Document Classification.
Mohammad KhabbazKeivan KianmehrReda AlhajjPublished in: IEEE Trans. Syst. Man Cybern. Part C (2012)
Keyphrases
- document classification
- semi structured
- web documents
- feature extraction
- text mining
- keywords
- information extraction
- text documents
- textual data
- structured data
- text categorization
- semistructured databases
- text classification
- data model
- tree structured patterns
- semistructured documents
- web pages
- semistructured data
- web data sources
- classification algorithm
- machine learning
- natural language
- multimedia
- information retrieval
- image classification
- feature selection
- topic models
- information retrieval systems
- natural language processing
- knowledge discovery
- feature space
- document clustering
- training data
- artificial intelligence
- data sets