On generating large-scale ground truth datasets for the deduplication of bibliographic records.
James Alistair HammertonMichael GranitzerDan HarveyMaya HristakevaKris JackPublished in: WIMS (2012)
Keyphrases
- ground truth
- record linkage
- bibliographic information
- scientific data analysis
- database
- small scale
- digital libraries
- real world
- uci machine learning repository
- real life
- human subjects
- ground truth data
- automatically generating
- databases
- image classification
- data cleaning
- data records
- data sources
- case study
- neural network
- massive graphs