A data labeling method for clustering categorical data.
Fuyuan CaoJiye LiangPublished in: Expert Syst. Appl. (2011)
Keyphrases
- synthetic data
- test data
- missing data
- data collection
- data sets
- correlation analysis
- prior information
- prior knowledge
- input data
- data processing
- data quality
- missing values
- detection method
- statistical analysis
- dynamic programming
- significant improvement
- statistical significance
- computational cost
- noisy data
- preprocessing
- statistical methods
- data analysis
- training samples
- cost function
- raw data
- active learning
- similarity measure
- information loss
- pairwise
- high resolution
- data distribution
- support vector machine
- unsupervised learning
- probabilistic model
- probability distribution
- feature set
- high accuracy