Draining the Data Swamp: A Similarity-based Approach.
Will BrackenburyRui LiuMainack MondalAaron J. ElmoreBlase UrKyle ChardMichael J. FranklinPublished in: HILDA@SIGMOD (2018)
Keyphrases
- data sets
- data processing
- database
- data sources
- high quality
- image data
- original data
- missing data
- synthetic data
- data analysis
- data structure
- small number
- input data
- data collection
- statistical analysis
- raw data
- prior knowledge
- data points
- dimensionality reduction
- training data
- website
- distance function
- social networks
- search engine
- data quality