Optimizing Checkpoint Restart with Data Deduplication.
Zhengyu ChenJianhua SunHao ChenPublished in: Sci. Program. (2016)
Keyphrases
- high quality
- data processing
- databases
- data structure
- database
- prior knowledge
- noisy data
- raw data
- data analysis
- data sets
- original data
- data distribution
- application domains
- missing data
- synthetic data
- data collection
- data points
- training data
- computer systems
- random walk
- high dimensional data
- attribute values
- experimental data
- decision trees
- multimedia data
- metadata
- information retrieval
- machine learning
- big data