Documents as a Bag of Maximal Substrings - An Unsupervised Feature Extraction for Document Clustering.
Tomonari MasadaYuichiro ShibataKiyoshi OguriPublished in: ICEIS (1) (2011)
Keyphrases
- document clustering
- feature extraction
- document collections
- text documents
- document representation
- text mining
- clustering method
- document clusters
- clustering algorithm
- document similarity
- bag of words
- document set
- supervised learning
- similar documents
- vector space model
- topic extraction
- topic detection
- tf idf
- automatic categorization
- image classification
- feature vectors
- semi supervised
- k means
- related documents
- feature selection
- feature space
- pairwise constraints
- cluster analysis
- unsupervised learning
- data mining
- tolerance rough set
- document categorization
- document retrieval