PACk: An Efficient Partition-based Distributed Agglomerative Hierarchical Clustering Algorithm for Deduplication.
Yue WangVivek R. NarasayyaYeye HeSurajit ChaudhuriPublished in: Proc. VLDB Endow. (2022)
Keyphrases
- clustering algorithm
- agglomerative hierarchical
- agglomerative hierarchical clustering
- cluster analysis
- hierarchical clustering algorithm
- semi supervised clustering
- data clustering
- fuzzy clustering
- k means
- fuzzy c means
- clustering method
- clustering result
- hierarchical clustering
- spectral clustering
- document clustering
- arbitrary shape
- document collections
- active learning
- query processing