Choose Your Words Carefully: An Empirical Study of Feature Selection Metrics for Text Classification.
George FormanPublished in: PKDD (2002)
Keyphrases
- text classification
- feature selection
- n gram
- text documents
- distributional clustering
- text categorization
- training corpus
- training documents
- bag of words
- naive bayes
- document frequency
- text mining
- machine learning
- information gain
- classification accuracy
- mutual information
- labeled data
- automatic text classification
- supervised feature selection
- feature weighting
- document representation
- multi label
- knn
- feature engineering
- support vector machine
- text data
- term frequency
- chi squared
- k nearest neighbor
- feature set
- text classifiers
- semantic features
- sentiment classification
- unlabeled data
- evaluation metrics
- sentiment analysis
- feature selection algorithms
- part of speech
- data cleaning
- feature space
- language modeling
- word sense disambiguation
- feature subset
- keywords
- training data
- feature extraction
- support vector
- dimensionality reduction
- neural network