Raising High-Degree Overlapped Character Bigrams into Trigrams for Dimensionality Reduction in Chinese Text Categorization.
Xue DejunMaosong SunPublished in: CICLing (2004)
Keyphrases
- text categorization
- pattern recognition and machine learning
- dimensionality reduction
- n gram
- feature selection
- text classification
- part of speech
- multi label
- text documents
- naive bayes
- knn
- information gain
- k nearest neighbor
- reuters corpus
- term frequency
- semi supervised learning
- term weighting
- document frequency
- automated text categorization
- automatic text categorization
- high dimensional data
- unsupervised learning
- language model
- active learning
- feature space
- feature extraction
- bag of words
- linear discriminant analysis
- unlabeled data
- text classifiers
- nearest neighbor
- data points
- high dimensional
- multi instance multi label learning