An exploratory analysis of methods for real-time data deduplication in streaming processes.
João EstevesRosa Maria CostaYongluan ZhouAna AlmeidaPublished in: DEBS (2023)
Keyphrases
- real time
- data sets
- data mining methods
- high quality
- data mining techniques
- statistical methods
- data structure
- data analysis
- data points
- data collection
- image data
- significant improvement
- data quality
- high dimensional data
- benchmark datasets
- statistical analysis
- raw data
- spectral clustering
- missing values
- data acquisition
- data reduction
- data distribution
- spatial data
- data processing
- prior knowledge
- synthetic data
- machine learning methods
- input data
- high speed
- human experts
- end users
- preprocessing
- clustering algorithm
- multiple sources
- data mining
- continuous stream