Acquaintance: Language-Independent Document Categorization by N-Grams.
Stephen HuffmanPublished in: TREC (1995)
Keyphrases
- language independent
- n gram
- document categorization
- text classification
- text categorization
- bag of words
- document representation
- text documents
- text mining
- language model
- word level
- language modeling
- part of speech
- knn
- feature selection
- word segmentation
- k nearest neighbor
- cross lingual
- labeled data
- data analysis
- vector space model
- machine learning
- text retrieval
- search engine
- neural network