Extreme Binning: Scalable, parallel deduplication for chunk-based file backup.
Deepavali BhagwatKave EshghiDarrell D. E. LongMark LillibridgePublished in: MASCOTS (2009)
Keyphrases
- high availability
- scalable distributed data structure
- database
- multi processor
- distributed memory
- parallel processing
- massively parallel
- lightweight
- real time
- single processor
- genetic algorithm
- database systems
- record linkage
- case study
- data cleaning
- web scale
- information systems
- parallel architectures
- social networks
- file organization
- parallel hardware
- machine learning