Estimation of deduplication ratios in large data sets.
Danny HarnikOded MargalitDalit NaorDmitry SotnikovGil VernikPublished in: MSST (2012)
Keyphrases
- data sets
- statistically sound
- databases
- genetic algorithm
- least squares
- density estimation
- estimation algorithm
- feature selection
- decision trees
- similarity measure
- search algorithm
- data analysis
- evolutionary algorithm
- parametric models
- maximum likelihood estimation
- robust estimation
- estimation accuracy
- machine learning