Author Identification in Imbalanced Sets of Source Code Samples.
Evangelos ChatzicharalampousGeorgia FrantzeskouEfstathios StamatatosPublished in: ICTAI (2012)
Keyphrases
- source code
- author identification
- software systems
- open source
- highly skewed
- software maintenance
- software projects
- free software
- plagiarism detection
- minority class
- training samples
- class distribution
- program understanding
- imbalanced datasets
- software evolution
- data sets
- class imbalance
- imbalanced data
- source files
- information retrieval
- naive bayes
- document collections
- information extraction
- active learning
- training set
- decision trees
- bug localization