A Dynamic Chunking Algorithm Approach for Data Deduplication.
Xiandong YuanAng LiPublished in: ICISE (2023)
Keyphrases
- input data
- noisy data
- data analysis
- data sets
- synthetic data
- dynamic programming
- information loss
- data structure
- detection algorithm
- database
- prior information
- learning algorithm
- search space
- data quality
- preprocessing
- recognition algorithm
- incomplete data
- k means
- knowledge discovery
- missing data
- synthetic datasets
- clustering method
- expectation maximization
- optimization algorithm
- data reduction
- spectral clustering
- data distribution
- data mining techniques
- dynamic environments
- simulated annealing
- optimal solution
- computational cost
- dimensional data
- neural network
- training data
- objective function
- computational complexity
- feature space
- np hard
- data sources
- missing values
- probabilistic model
- structured data
- knn
- data collection