A Probabilistic Description-Oriented Approach for Categorizing Web Documents.
Norbert GövertMounia LalmasNorbert FuhrPublished in: CIKM (1999)
Keyphrases
- web documents
- web pages
- information extraction
- semi structured
- web search engines
- document classification
- probabilistic model
- web content
- vector space model
- high level
- keywords
- html documents
- text categorization
- document representation
- information retrieval
- textual information
- web data
- search engine
- data mining
- topic specific
- tree structured patterns
- semistructured documents