Effect of Corpus Size Selection on Performance of Map-Reduce Based Distributed K-Means for Big Textual Data Clustering.
Shwet KetuBakshi Rohit PrasadSonali AgarwalPublished in: ICCCT (2015)
Keyphrases
- data clustering
- map reduce
- k means
- cloud computing
- clustering algorithm
- spectral clustering
- open source
- efficient implementation
- cluster analysis
- parallel computation
- recently developed
- unsupervised learning
- community detection
- big data
- distributed systems
- parallel computing
- join operations
- rough k means
- clustering method
- self organizing maps
- clustering quality
- data distribution
- expectation maximization
- semi supervised
- metadata
- clustering solutions
- databases