A Comprehensive Study of the Past, Present, and Future of Data Deduplication.
Wen XiaHong JiangDan FengFred DouglisPhilip ShilaneYu HuaMin FuYucheng ZhangYukun ZhouPublished in: Proc. IEEE (2016)
Keyphrases
- data sets
- data collection
- data analysis
- raw data
- experimental data
- training data
- high quality
- data processing
- input data
- record linkage
- complex data
- data distribution
- application domains
- synthetic data
- data structure
- data points
- prior knowledge
- training set
- data streams
- high dimensional data
- domain experts
- spatial data
- case study
- information systems
- machine learning
- network structure
- real world
- data quality
- neural network
- databases