Classification of News Web Documents Based on Structural Features.
Shisanu TongchimVirach SornlertlamvanichHitoshi IsaharaPublished in: FinTAL (2006)
Keyphrases
- web documents
- structural features
- keywords
- tree structured patterns
- web pages
- semi structured
- tree kernels
- information extraction
- decision trees
- feature set
- machine learning
- html documents
- image classification
- feature extraction
- structural information
- feature space
- training data
- co occurrence
- low level
- training set
- secondary structure
- databases