Sceadan: Using Concatenated N-Gram Vectors for Improved File and Data Type Classification.
Nicole Lang BeebeLaurence A. MaddoxLishu LiuMinghe SunPublished in: IEEE Trans. Inf. Forensics Secur. (2013)
Keyphrases
- n gram
- data types
- feature vectors
- text classification
- language model
- data structure
- machine learning
- data model
- feature extraction
- database systems
- language modeling
- classification accuracy
- language modelling
- language independent
- preprocessing
- decision trees
- database management systems
- word segmentation
- feature selection
- variable length
- data mining
- retrieval model
- feature space
- text categorization
- abstract data types
- management system
- inside outside algorithm