The Class Imbalance Problem in Construction of Training Datasets for Authorship Attribution.
Urszula StanczykPublished in: ICMMI (2015)
Keyphrases
- class imbalance
- training dataset
- authorship attribution
- minority class
- class distribution
- training data
- active learning
- training set
- cost sensitive
- class labels
- concept drift
- high dimensionality
- feature selection
- training samples
- plagiarism detection
- source code
- database
- writing style
- data streams
- text mining
- supervised learning
- small number
- multi class
- learning environment
- decision trees
- data mining
- data sets