The impact of imbalanced training data on machine learning for author name disambiguation.
Jinseok KimJenna KimPublished in: Scientometrics (2018)
Keyphrases
- machine learning
- training data
- learning algorithm
- decision trees
- supervised learning
- machine learning algorithms
- class distribution
- pattern recognition
- computer vision
- data mining
- training instances
- training examples
- learning systems
- computer science
- training set
- machine learning methods
- test data
- classification accuracy
- support vector machine
- text classification
- text mining
- knowledge discovery
- inductive learning
- inductive logic programming
- domain knowledge
- semi supervised learning
- active learning
- class labels
- test cases
- statistical methods
- test set
- prior knowledge
- generalization error
- training dataset
- naive bayes