An Extended Document Frequency Metric for Feature Selection in Text Categorization.
Yan XuBin WangJintao LiHongfang JingPublished in: AIRS (2008)
Keyphrases
- text categorization
- document frequency
- feature selection
- information gain
- term frequency
- text classification
- term selection
- knn
- text documents
- k nearest neighbor
- naive bayes
- tf idf
- unlabeled data
- n gram
- document representation
- mutual information
- feature set
- information retrieval
- distance measure
- feature extraction
- support vector
- similarity search
- semi supervised learning
- labeled data