Similarity based deduplication with small data chunks.
Lior AronovichRon AsherDanny HarnikMichael HirschShmuel T. KleinYair ToaffPublished in: Discret. Appl. Math. (2016)
Keyphrases
- data collection
- data sets
- synthetic data
- data processing
- knowledge discovery
- data analysis
- database
- training data
- raw data
- complex data
- historical data
- wireless sensor networks
- data sources
- data points
- data quality
- image data
- experimental data
- application domains
- multimedia data
- data acquisition
- data cleaning
- high dimensional data
- information sources
- input data
- small number
- social media
- xml documents
- high dimensional
- high quality
- databases