Hierarchical Linkage Clustering with Distributions of Distances for Large-Scale Record Linkage.
Samuel L. VenturaRebecca NugentErica R. H. FuchsPublished in: Privacy in Statistical Databases (2014)
Keyphrases
- record linkage
- hierarchical clustering
- duplicate detection
- data cleaning
- entity resolution
- privacy preserving
- clustering algorithm
- multiple databases
- clustering method
- k means
- approximate matching
- dissimilarity measure
- unsupervised learning
- distance measure
- census data
- linked data
- agglomerative clustering
- categorical data
- euclidean distance
- group membership
- hierarchical tree
- artificial intelligence
- cluster analysis
- disclosure risk
- probability distribution
- databases
- single linkage