Eliminating High-Degree Biased Character Bigrams for Dimensionality Reduction in Chinese Text Categorization.
Xue DejunMaosong SunPublished in: ECIR (2004)
Keyphrases
- text categorization
- dimensionality reduction
- pattern recognition and machine learning
- feature selection
- text classification
- multi label
- automated text categorization
- knn
- reuters corpus
- text classifiers
- text documents
- low dimensional
- high dimensional data
- information gain
- semi supervised learning
- feature weighting
- k nearest neighbor
- naive bayes
- feature selections
- automatic text categorization
- n gram
- feature space
- feature extraction
- unsupervised learning
- principal component analysis
- term frequency
- tf idf
- named entities
- unlabeled data
- data points
- pattern recognition
- feature selection for text categorization