Categorization of Web Documents Using Character Encodings.
Nathaniel GustafsonSeungJin LimYiu-Kai NgPublished in: ICDAT (2005)
Keyphrases
- web documents
- information extraction
- web search engines
- text categorization
- keywords
- semi structured
- web pages
- textual information
- document classification
- vector space model
- web logs
- unstructured documents
- feature selection
- focused crawling
- structured documents
- link structure
- web content
- automatic classification
- database systems