Exploiting structural information for semi-structured document categorization.
Andrej BratkoBogdan FilipicPublished in: Inf. Process. Manag. (2006)
Keyphrases
- structural information
- semi structured
- document categorization
- text mining
- text categorization
- structured data
- information extraction
- text classification
- meta learning
- text documents
- web documents
- data model
- document clustering
- vector space model
- document representation
- textual data
- document classification
- latent semantic indexing
- inductive learning
- semantic information
- data sets
- machine learning
- real world
- document retrieval
- metadata
- active learning
- n gram
- clustering method
- domain knowledge