Author Identification in Imbalanced Sets of Source Code Samples.

Evangelos Chatzicharalampous Georgia Frantzeskou Efstathios Stamatatos

Published in: ICTAI (2012)

Keyphrases

source code
author identification
software systems
open source
highly skewed
software maintenance
software projects
free software
plagiarism detection
minority class
training samples
class distribution
program understanding
imbalanced datasets
software evolution
data sets
class imbalance
imbalanced data
source files
information retrieval
naive bayes
document collections
information extraction
active learning
training set
decision trees
bug localization