2-Way Text Classification for Harmful Web Documents.
Youngsoo KimTaekyong NamDongho WonPublished in: ICCSA (2) (2006)
Keyphrases
- web documents
- text classification
- bag of words
- web pages
- machine learning
- n gram
- information extraction
- text categorization
- text mining
- document classification
- feature selection
- semi structured
- text data
- text documents
- labeled data
- web data
- keywords
- web search engines
- textual information
- vector space model
- html documents
- knn
- document representation
- link structure
- web content
- classify documents
- structured documents
- semantic features
- knowledge discovery
- active learning
- information retrieval