Distributional Word Clusters vs. Words for Text Categorization.
Ron BekkermanRan El-YanivNaftali TishbyYoad WinterPublished in: J. Mach. Learn. Res. (2003)
Keyphrases
- text categorization
- distributional clustering
- document frequency
- word frequency
- text classification
- information theoretic
- text documents
- n gram
- term frequency
- knn
- feature selection
- k nearest neighbor
- term weighting
- clustering algorithm
- co occurrence
- information gain
- multi label
- text classifiers
- training documents
- word sense disambiguation
- multiword
- document clustering
- text collections
- automatic text categorization
- data points
- reuters corpus
- machine learning
- automatic summarization
- tf idf
- data sets
- training data
- unlabeled data
- wordnet
- nearest neighbor