A Probabilistic Model for Classification of Multiple-Record Web Documents.
June TangYiu-Kai NgPublished in: OOIS (2000)
Keyphrases
- web documents
- document classification
- probabilistic model
- web pages
- semi structured
- web search engines
- vector space model
- machine learning
- classification algorithm
- language model
- information extraction
- image classification
- text classification
- web data
- keywords
- automatic classification
- web content
- document representation
- html documents
- text categorization
- supervised learning
- feature space
- bayesian networks
- database
- web search
- link structure
- website
- focused crawling