A Comparison and Semi-Quantitative Analysis of Words and Character-Bigrams as Features in Chinese Text Categorization.
Jingyang LiMaosong SunXian ZhangPublished in: ACL (2006)
Keyphrases
- text categorization
- quantitative analysis
- feature generation
- information gain
- qualitative analysis
- training documents
- feature weighting
- text documents
- text classification
- multi label
- feature reduction
- qualitative evaluation
- distributional clustering
- n gram
- knn
- word segmentation
- chinese characters
- feature selection
- document frequency
- k nearest neighbor
- text classifiers
- reuters corpus
- prior knowledge
- feature vectors
- feature set
- co occurrence
- semi supervised learning
- word frequency
- automatic text categorization
- term frequency
- feature selections
- feature extraction
- information retrieval
- qualitative and quantitative analysis
- feature selection for text categorization
- keywords
- feature space
- active learning
- tf idf
- language model
- word sense disambiguation