Design and implementation of a Bloom filter-based data deduplication algorithm for efficient data management.
Young-Hwan JangNam-Uk LeeHyung-Jun KimSeok-Cheon ParkPublished in: J. Ambient Intell. Humaniz. Comput. (2024)
Keyphrases
- input data
- data management
- detection algorithm
- data sets
- noisy data
- data reduction
- data sources
- single pass
- efficient implementation
- data analysis
- data structure
- clustering method
- np hard
- computationally efficient
- data processing
- highly optimized
- spatial data
- data distribution
- spectral clustering
- heterogeneous data
- complexity analysis
- high efficiency
- synthetic datasets
- learning algorithm
- database systems
- expectation maximization
- knowledge discovery
- preprocessing
- user interface
- evolutionary algorithm
- hardware architecture
- feature selection
- optimal solution
- parallel implementation
- segmentation algorithm
- worst case
- data collection
- outlier detection
- database management systems
- synthetic data
- design process