Key-based Blocking of Duplicates in Entity-Independent Probabilistic Data.
Fabian PanseWolfram WingerathSteffen FriedrichNorbert RitterPublished in: ICIQ (2012)
Keyphrases
- data sets
- raw data
- statistical analysis
- synthetic data
- data collection
- data processing
- database
- data acquisition
- experimental data
- uncertain data
- data mining algorithms
- data sources
- prior knowledge
- end users
- data points
- knowledge discovery
- data mining techniques
- training set
- feature space
- data structure
- high quality
- application domains
- training data
- data distribution
- information systems
- data mining
- databases