Improving Document Clustering by Eliminating Unnatural Language.
Myungha JangJinho D. ChoiJames AllanPublished in: CoRR (2017)
Keyphrases
- document clustering
- topic extraction
- clustering algorithm
- text mining
- clustering method
- document representation
- document collections
- text documents
- vector space model
- natural language
- document clusters
- negative matrix factorization
- k means
- tolerance rough set
- cluster analysis
- latent semantic indexing
- active learning
- information retrieval
- databases