HARRA: fast iterative hashed record linkage for large-scale data collections.
Hung-sik KimDongwon LeePublished in: EDBT (2010)
Keyphrases
- record linkage
- data collections
- duplicate detection
- semi structured
- privacy preserving
- data collection
- approximate matching
- entity resolution
- multiple databases
- data cleaning
- document collections
- linked data
- data sources
- digital data
- disclosure risk
- data sets
- databases
- xml data
- case study
- database
- group membership
- hash functions
- expert systems
- data structure
- database systems
- information systems
- data mining
- census data