Stemming and lemmatization in the clustering of finnish text documents.
Tuomo KoreniusJorma LaurikkalaKalervo JärvelinMartti JuholaPublished in: CIKM (2004)
Keyphrases
- text documents
- document clustering
- text clustering
- text mining
- text classification
- text categorization
- document classification
- wordnet
- information extraction
- bag of words
- topic models
- clustering algorithm
- keywords
- tf idf
- text data
- text representation
- named entities
- clustering method
- k means
- hierarchical clustering
- text collections
- unsupervised learning
- n gram
- information retrieval
- data sets
- supervised learning
- knowledge representation
- image segmentation
- knowledge base
- machine learning
- data mining