Author identification: Using text sampling to handle the class imbalance problem.
Efstathios StamatatosPublished in: Inf. Process. Manag. (2008)
Keyphrases
- author identification
- highly skewed
- class imbalance
- class distribution
- sampling methods
- minority class
- majority class
- imbalanced data
- active learning
- imbalanced datasets
- cost sensitive learning
- imbalanced class distribution
- high dimensionality
- cost sensitive
- concept drift
- misclassification costs
- random sampling
- training data
- feature selection
- naive bayes
- rare events
- machine learning
- imbalanced data sets
- feature extraction