Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5.
Evgeniy GabrilovichShaul MarkovitchPublished in: ICML (2004)
Keyphrases
- text categorization
- feature selection
- redundant features
- support vector
- information gain
- text classification
- irrelevant features
- feature selection algorithms
- linear svm
- text documents
- text classifiers
- feature subset
- multi label
- automated text categorization
- multi class
- high dimensionality
- naive bayes
- feature weighting
- support vector machine
- classification accuracy
- feature space
- document classification
- feature set
- model selection
- feature extraction
- knn
- multi task
- k nearest neighbor
- machine learning
- document frequency
- multi label classification
- feature selections
- tf idf
- dimensionality reduction
- nearest neighbor
- active learning
- training data
- decision trees
- information retrieval