Efficient Joinable Table Discovery in Data Lakes: A High-Dimensional Similarity-Based Approach.
Yuyang DongKunihiro TakeokaChuan XiaoMasafumi OyamadaPublished in: CoRR (2020)
Keyphrases
- data analysis
- data sets
- high dimensional
- data sources
- database
- sparse data
- data collection
- data mining
- data points
- image data
- raw data
- statistical analysis
- high dimensional data
- data retrieval
- input space
- missing data
- input data
- data processing
- data model
- distance function
- gene expression data
- low dimensional
- original data
- parameter space
- knowledge discovery
- satellite images
- probability distribution
- data structure