Data Deduplication System Based on Content-Defined Chunking Using Bytes Pair Frequency Occurrence.
Ahmed Sardar M. SaeedLoay Edwar GeorgePublished in: Symmetry (2020)
Keyphrases
- data sets
- raw data
- data collection
- original data
- data analysis
- data processing
- databases
- experimental data
- high quality
- image data
- natural language processing
- data quality
- multimedia data
- computer systems
- input data
- database
- data structure
- learning algorithm
- information retrieval
- neural network
- data points
- data sources
- feature space
- missing data
- attribute values
- training data
- user defined
- website
- complex data
- named entity recognition
- record linkage