Less is More: Parameter-Free Text Classification with Gzip.
Zhiying JiangMatthew Y. R. YangMikhail TsirlinRaphael TangJimmy LinPublished in: CoRR (2022)
Keyphrases
- parameter free
- text classification
- data compression
- text categorization
- categorical data
- bag of words
- feature selection
- outlier detection
- text mining
- machine learning
- data cleaning
- labeled data
- text classifiers
- text data
- text documents
- naive bayes
- n gram
- knn
- unsupervised learning
- multi label
- information extraction
- compression ratio
- semantic features
- database
- similarity measure
- pairwise
- fully automatic
- semi supervised
- image classification