Improving Document Clustering by Removing Unnatural Language.
Myungha JangJinho D. ChoiJames AllanPublished in: NUT@EMNLP (2017)
Keyphrases
- document clustering
- topic extraction
- text mining
- document collections
- clustering method
- negative matrix factorization
- clustering algorithm
- document representation
- text documents
- vector space model
- tolerance rough set
- document clusters
- cluster analysis
- information retrieval systems
- pairwise constraints
- document corpus
- text categorization
- clustering approaches
- co occurrence
- natural language
- machine learning