Acquaintance: A Novel Vector-Space N-Gram Technique for Document Categorization.
Stephen HuffmanMarc DamashekPublished in: TREC (1994)
Keyphrases
- document categorization
- n gram
- vector space
- text classification
- latent semantic indexing
- document representation
- vector space model
- language model
- bag of words
- text categorization
- retrieval model
- distance measure
- similarity search
- language modeling
- document classification
- meta learning
- low dimensional
- tf idf
- web documents
- machine learning
- text documents
- unlabeled data
- labeled data
- knn
- feature vectors
- naive bayes
- clustering algorithm
- feature selection
- text classifiers
- information retrieval
- probabilistic model
- high dimensional