Demystifying data deduplication.
NagaPramod MandagerePin ZhouMark A. SmithSandeep UttamchandaniPublished in: Middleware (Companion) (2008)
Keyphrases
- data sets
- data analysis
- data points
- database
- data collection
- real time
- application domains
- high quality
- data structure
- synthetic data
- probability distribution
- small number
- complex data
- raw data
- background knowledge
- statistical analysis
- prior knowledge
- training data
- data processing
- data mining techniques
- data sources
- attribute values
- network structure
- information systems
- neural network
- data cleaning