Threshold selection for web-page classification with highly skewed class distribution.
Xiaofeng HeLei DuanYiping ZhouByron DomPublished in: WWW (2009)
Keyphrases
- web page classification
- threshold selection
- highly skewed
- class distribution
- text classification
- cost sensitive
- class imbalance
- misclassification costs
- training data
- training set
- unlabeled data
- naive bayes
- imbalanced datasets
- imbalanced data
- test set
- feature selection
- training samples
- majority class
- cost sensitive learning
- test data
- training examples
- text categorization
- multi class
- imbalanced data sets
- class labels
- minority class
- bayesian networks
- random sampling
- rare events
- concept drift
- base classifiers
- machine learning
- classification accuracy