An Extensive Empirical Study of Feature Selection Metrics for Text Classification.
George FormanPublished in: J. Mach. Learn. Res. (2003)
Keyphrases
- text classification
- feature selection
- text categorization
- bag of words
- text classifiers
- naive bayes
- text data
- information gain
- unsupervised learning
- mutual information
- sentiment analysis
- n gram
- feature engineering
- web page classification
- feature set
- text documents
- labeled data
- feature weighting
- knn
- k nearest neighbor
- feature space
- chi squared
- machine learning
- irrelevant features
- data cleaning
- feature ranking
- evaluation metrics
- software defect prediction
- supervised feature selection
- high dimensionality
- classification accuracy
- model selection
- dimensionality reduction
- text mining
- multi class
- support vector machine
- multi label
- semi supervised learning
- selected features
- graph cuts
- knowledge discovery
- information retrieval
- neural network
- data sets