Similarity joins for high-dimensional data using Spark.
Chuitian RongXiaohai ChengZiliang ChenNa HuoPublished in: Concurr. Comput. Pract. Exp. (2019)
Keyphrases
- high dimensional data
- similarity join
- similarity search
- metric space
- dimensionality reduction
- high dimensional
- high dimensional data sets
- nearest neighbor
- low dimensional
- distance function
- data points
- data analysis
- distance computation
- subspace clustering
- data sets
- edit distance
- dimension reduction
- uncertain data
- indexing techniques
- data distribution
- structural similarity
- similarity queries
- vector space
- hash functions
- join algorithms
- xml data
- bloom filter
- neural network
- similar objects
- r tree
- database systems
- feature selection
- learning algorithm