Do birds of a feather really flock together, or how to choose training samples for authorship attribution.
Maciej EderJan RybickiPublished in: Lit. Linguistic Comput. (2013)
Keyphrases
- training samples
- authorship attribution
- feature space
- supervised learning
- training data
- plagiarism detection
- learning algorithm
- digital forensics
- test sample
- number of training samples
- training set
- high dimensional
- source code
- high level
- writing style
- representative samples
- base classifiers
- classification accuracy
- feature selection