Evaluating Representativeness in PDF Malware Datasets: A Comparative Study and a New Dataset.
Ran LiuRobert J. JoyceCynthia MatuszekCharles NicholasPublished in: BigData (2023)
Keyphrases
- synthetic datasets
- benchmark datasets
- training dataset
- high dimensional datasets
- pascal voc
- uci datasets
- database
- image dataset
- massive datasets
- probability density function
- million images
- object detection
- mixture model
- bag of words
- comparative study
- reverse engineering
- artificial and real world datasets
- standard learning algorithms
- malware detection
- class imbalanced data
- object recognition
- static analysis
- dynamic analysis
- test data
- maximum likelihood
- feature set
- probabilistic model
- probability distribution function
- real life
- learning algorithm