An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets.
Bee Wah YapKhatijahhusna Abd RaniHezlin Aryani Abd RahmanSimon FongZuraida KhairudinNik Nik AbdullahPublished in: DaEng (2013)
Keyphrases
- imbalanced datasets
- ensemble methods
- majority class
- class imbalance
- imbalanced data
- base classifiers
- class distribution
- minority class
- ensemble learning
- cost sensitive
- base learners
- learning from imbalanced data
- prediction accuracy
- decision trees
- random forest
- cost sensitive learning
- benchmark datasets
- random forests
- machine learning methods
- concept drift
- active learning
- ensemble classifier
- binary classification
- generalization ability
- feature selection
- sampling methods
- classification error
- machine learning
- multi class
- naive bayes
- misclassification costs
- weak classifiers
- training samples
- meta learning
- learning scheme
- data distribution
- classification trees
- test set
- high dimensionality
- generalization error
- training set
- feature subset
- learning algorithm