A Deduplication Algorithm Based on Data Similarity and Delta Encoding.
Bin SongLimin XiaoGuangjun QinLi RuanShida QiuPublished in: GRMSE (2) (2016)
Keyphrases
- input data
- data sets
- dynamic programming
- data processing
- learning algorithm
- similarity matrix
- data analysis
- data structure
- data reduction
- preprocessing
- search space
- noisy data
- training data
- detection algorithm
- similarity measure
- synthetic datasets
- original data
- synthetic data
- neural network
- image data
- objective function
- np hard
- database
- computational complexity
- k means
- particle swarm optimization
- worst case
- data collection
- expectation maximization
- computational cost
- data distribution
- cost function
- probabilistic model
- data quality
- optimal solution
- data mining techniques
- similarity metric