De-identifying Socioeconomic Data at the Census Tract Level for Medical Research Through Constraint-based Clustering.
Yongtai LiuDouglas ConwayZhiyu WanMurat KantarciogluYevgeniy VorobeychikBradley A. MalinPublished in: AMIA (2021)
Keyphrases
- data collection
- data sets
- statistical analysis
- data points
- training data
- database
- data processing
- categorical data
- data mining techniques
- data sources
- high quality
- knowledge discovery
- data analysis
- data structure
- unsupervised learning
- clustering algorithm
- synthetic data
- missing data
- original data
- data objects
- data quality
- clustering analysis
- synthetic datasets
- fuzzy clustering
- high dimensional data
- data streams
- xml documents