A comparative study of TF*IDF, LSI and multi-words for text classification.
Wen ZhangTaketoshi YoshidaXijin J. TangPublished in: Expert Syst. Appl. (2011)
Keyphrases
- tf idf
- text classification
- text documents
- text categorization
- term frequency
- document representation
- bag of words
- text mining
- latent semantic indexing
- document clustering
- n gram
- weighting scheme
- stop words
- vector space model
- text data
- document frequency
- knn
- labeled data
- feature selection
- machine learning
- term weighting
- k nearest neighbor
- unlabeled data
- information retrieval
- keywords
- natural language processing
- vector space
- information extraction
- named entities
- digital libraries
- training data
- multimedia