Zero-Chunk: An Efficient Cache Algorithm to Accelerate the I/O Processing of Data Deduplication.
Hongyuan GaoChentao WuJie LiMinyi GuoPublished in: ICPADS (2016)
Keyphrases
- data processing
- input data
- data sets
- noisy data
- optimal solution
- database
- learning algorithm
- dynamic programming
- hit rate
- data sources
- data analysis
- detection algorithm
- np hard
- k means
- probabilistic model
- computational complexity
- expectation maximization
- missing data
- data objects
- training data
- xml documents
- storage systems
- record linkage
- highly efficient
- hardware implementation
- objective function
- high dimensional data
- clustering method