Information gain and divergence-based feature selection for machine learning-based text categorization.
Changki LeeGary Geunbae LeePublished in: Inf. Process. Manag. (2006)
Keyphrases
- information gain
- text categorization
- feature selection
- machine learning
- text classification
- mutual information
- chi squared
- naive bayes
- semi supervised learning
- automated text categorization
- text documents
- support vector machine
- knn
- document frequency
- classification accuracy
- decision trees
- multi label
- feature set
- feature extraction
- term frequency
- unsupervised learning
- feature selections
- feature subset
- text mining
- text classifiers
- k nearest neighbor
- machine learning algorithms
- tf idf
- term selection
- model selection
- data mining
- dimensionality reduction
- information extraction
- active learning
- support vector
- labeled data
- supervised learning
- image registration
- pairwise
- feature ranking
- feature space
- data analysis
- computer vision
- information retrieval
- feature selection for text categorization
- correlation based feature selection