A Byte Sequence is Worth an Image: CNN for File Fragment Classification Using Bit Shift and n-Gram Embeddings.
Wenyang LiuYi WangKejun WuKim-Hui YapLap-Pui ChauPublished in: CoRR (2023)
Keyphrases
- n gram
- image classification
- text classification
- viterbi algorithm
- language model
- bag of words
- image representation
- language independent
- classification accuracy
- feature vectors
- pseudorandom
- language modeling
- language modelling
- feature space
- decision trees
- machine learning
- probabilistic model
- dynamic programming
- variable length
- data mining
- convolutional neural network
- web documents
- low dimensional
- support vector machine
- feature extraction