Frequency Based Chunking for Data De-Duplication.
Guanlin LuYu JinDavid H. C. DuPublished in: MASCOTS (2010)
Keyphrases
- data sets
- data sources
- data analysis
- computer systems
- probability distribution
- experimental data
- database
- data collection
- statistical analysis
- information systems
- data mining
- real time
- missing data
- data processing
- data distribution
- raw data
- original data
- databases
- input data
- high quality
- shallow parsing
- data quality
- noisy data
- data objects
- training data
- statistical methods
- missing values
- data structure
- application domains
- spatial data
- video sequences
- prior knowledge
- knowledge discovery
- image data