Efficient Joinable Table Discovery in Data Lakes: A High-Dimensional Similarity-Based Approach.
Yuyang DongKunihiro TakeokaChuan XiaoMasafumi OyamadaPublished in: ICDE (2021)
Keyphrases
- data points
- high dimensional
- database
- data sets
- sparse data
- data processing
- knowledge discovery
- data structure
- data collection
- experimental data
- high dimensional data
- training data
- association rules
- high dimensional spaces
- statistical analysis
- data mining techniques
- feature space
- high quality
- data sources
- dimensionality reduction
- data analysis
- sensor data
- discrete data
- databases
- data retrieval
- data quality
- noisy data
- original data
- user defined
- attribute values
- distance function
- synthetic data
- similarity search
- image data
- small number
- distance measure