WordNet-Based and N-Grams-Based Document Clustering: A Comparative Study.
Abdelmalek AmineZakaria ElberrichiMichel SimonetMimoun MalkiPublished in: BroadCom (2008)
Keyphrases
- document clustering
- n gram
- language model
- text classification
- text documents
- bag of words
- clustering algorithm
- text mining
- language independent
- variable length
- document collections
- document representation
- language modeling
- vector space model
- tf idf
- clustering method
- document clusters
- k means
- tolerance rough set
- machine learning
- information retrieval
- vector space
- web documents
- text categorization
- active learning
- digital libraries
- real world