基于数据摘要奇偶性的集合相似性近似算法 (Set Similarity Approximation Algorithm Based on Parity of Data Sketch).
Jianwei JiaLing ChenPublished in: 计算机科学 (2016)
Keyphrases
- input data
- small number
- data sets
- representative set
- similarity matrix
- noisy data
- np hard
- sample set
- user defined
- learning algorithm
- data reduction
- error bounds
- original data
- high dimensional data
- clustering method
- detection algorithm
- probability distribution
- dynamic programming
- preprocessing
- sufficient statistics
- similarity measure
- search space
- objective function
- data structure
- data points
- error tolerance
- distance metric
- k means
- initial set
- synthetic datasets
- feature space
- approximation ratio
- similarity function
- worst case
- convex hull
- xml documents
- distance function
- expectation maximization