Organographs - Multi-faceted Hierarchical Categorization of Web Documents.
Rodrigo Dias Arruda SenraClaudia Bauzer MedeirosPublished in: WEBIST (2011)
Keyphrases
- web documents
- multi faceted
- information extraction
- semi structured
- html documents
- web search engines
- document representation
- text categorization
- related web pages
- keywords
- web content
- document classification
- web pages
- textual information
- web data
- subspace projections
- vector space model
- hierarchical structure
- document collections
- content similarity
- learning algorithm
- link structure
- knowledge discovery