Data Deduplication Based on Hadoop.
Dongzhan ZhangChengfa LiaoWenjing YanRan TaoWei ZhengPublished in: CBD (2017)
Keyphrases
- synthetic data
- data sets
- big data
- raw data
- data sources
- statistical analysis
- original data
- high quality
- data analysis
- data structure
- image data
- open source
- data collection
- training data
- neural network
- data quality
- small number
- complex data
- high dimensional data
- computer systems
- end users
- xml documents
- real world
- databases