A benchmark comparison of deterministic and probabilistic methods for defining manual review datasets in duplicate records reconciliation.
Erel JoffeMichael J. ByrnePhillip ReederJorge R. HerskovicCraig W. JohnsonAllison B. McCoyDean F. SittigElmer V. BernstamPublished in: J. Am. Medical Informatics Assoc. (2014)
Keyphrases
- probabilistic approaches
- benchmark datasets
- machine learning methods
- databases
- experimental results on real world
- benchmark data sets
- computational cost
- preprocessing
- case study
- significant improvement
- high dimensional
- data mining techniques
- empirical studies
- feature selection
- qualitative and quantitative
- high dimensional datasets
- uci machine learning repository
- genetic algorithm
- data sets