A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm.
Harun UguzPublished in: Knowl. Based Syst. (2011)
Keyphrases
- text categorization
- information gain
- feature selection
- document frequency
- chi squared
- mutual information
- feature weighting
- genetic algorithm
- feature selection for text categorization
- knn
- decision trees
- classification accuracy
- support vector machine
- text classification
- feature subset
- feature reduction
- k nearest neighbor
- correlation coefficient
- principal component analysis
- image processing
- prior knowledge
- principal components
- multi label
- naive bayes
- support vector machine svm
- unlabeled data
- feature set
- feature space
- text documents
- semi supervised learning
- tf idf
- term frequency
- text classifiers
- labeled data
- model selection
- feature selections
- semi supervised