Interpretable and reconfigurable clustering of document datasets by deriving word-based rules.
Vipin BalachandranDeepak PDeepak KhemaniPublished in: Knowl. Inf. Syst. (2012)
Keyphrases
- document clustering
- classification rules
- clustering algorithm
- synthetic datasets
- keywords
- text corpus
- association rules
- information retrieval
- tolerance rough set
- spoken document retrieval
- data mining tasks
- document images
- clustering method
- k means
- text clustering
- clustering approaches
- high dimensional datasets
- information retrieval systems
- latent topics
- synthetic and real datasets
- cluster analysis
- related words
- cluster membership
- low cost
- co occurrence
- word level
- document space
- word recognition
- noun phrases
- document analysis
- term weighting
- tf idf
- syntactic categories
- text documents
- retrieval systems
- semantic information
- data mining
- word co occurrence
- word clouds
- printed documents
- related documents
- term frequency
- vector space model
- web documents
- document retrieval
- n gram