Introduction to Text Classification: Impact of Stemming and Comparing TF-IDF and Count Vectorization as Feature Extraction Technique.
André WendlandMarco ZenereJörg NiemannPublished in: EuroSPI (2021)
Keyphrases
- tf idf
- text classification
- text documents
- text categorization
- feature extraction
- term frequency
- stop words
- feature selection
- n gram
- information retrieval
- weighting scheme
- text mining
- retrieval model
- term weighting
- bag of words
- document clustering
- vector space model
- retrieval effectiveness
- machine learning
- image classification
- language model
- knn
- k nearest neighbor
- face recognition
- neural network
- unlabeled data
- labeled data
- ranking algorithm
- language modeling
- feature set
- support vector machine
- feature space