Tradeoffs in Scalable Data Routing for Deduplication Clusters.
Wei DongFred DouglisKai LiR. Hugo PattersonSazzala ReddyPhilip ShilanePublished in: FAST (2011)
Keyphrases
- data sets
- data points
- data collection
- training data
- data structure
- data distribution
- high quality
- data analysis
- data sources
- synthetic data
- data processing
- data samples
- data objects
- clustering algorithm
- sensor data
- high dimensional data
- data quality
- network structure
- input space
- spatial data
- database
- data cleaning
- computer systems
- input data
- small number
- image data
- prior knowledge
- high dimensional
- website
- feature selection
- learning algorithm