Droplet: A Distributed Solution of Data Deduplication.
Yang ZhangYongwei WuGuangwen YangPublished in: GRID (2012)
Keyphrases
- data sets
- synthetic data
- data processing
- data collection
- data sources
- data quality
- image data
- data mining techniques
- xml documents
- raw data
- distributed environment
- statistical analysis
- training data
- input data
- database
- data analysis
- data management
- distributed systems
- multi agent
- optimal solution
- data structure
- high dimensional data
- metadata
- missing data
- data mining algorithms
- data distribution
- data acquisition
- query processing
- neural network
- heterogeneous data
- data cleaning