A sample-and-clean framework for fast and accurate query processing on dirty data.
Jiannan WangSanjay KrishnanMichael J. FranklinKen GoldbergTim KraskaTova MiloPublished in: SIGMOD Conference (2014)
Keyphrases
- data sets
- query processing
- image data
- data structure
- data sources
- data collection
- database
- high quality
- original data
- experimental data
- training data
- data management
- raw data
- input data
- statistical analysis
- test data
- data mining techniques
- data quality
- sample size
- missing data
- small number
- knowledge discovery
- databases
- data processing
- high dimensional data
- spatial data
- xml documents
- data distribution
- bayesian networks
- database systems
- metadata
- data samples
- random sample