Link-based similarity measures for the classification of Web documents.
Pável CaladoMarco CristoMarcos André GonçalvesEdleno Silva de MouraBerthier A. Ribeiro-NetoNivio ZivianiPublished in: J. Assoc. Inf. Sci. Technol. (2006)
Keyphrases
- web documents
- document classification
- similarity measure
- link structure
- semi structured
- feature vectors
- information extraction
- web pages
- web search engines
- automatic classification
- keywords
- html documents
- document representation
- feature selection
- text classification
- classification algorithm
- feature space
- web data
- vector space model
- similarity scores
- web content
- focused crawling
- classify documents
- unstructured documents
- databases
- image classification
- web search
- text mining
- website
- machine learning