Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning.
Grace FanJin WangYuliang LiDan ZhangRenée J. MillerPublished in: Proc. VLDB Endow. (2023)
Keyphrases
- data sets
- image data
- learning process
- data collection
- training data
- synthetic data
- data sources
- data analysis
- learning algorithm
- high quality
- training dataset
- data quality
- prior knowledge
- data structure
- knowledge discovery
- human experts
- original data
- learning tasks
- background knowledge
- online learning
- knowledge acquisition
- database
- xml documents
- missing data
- microarray
- decision trees
- input data
- relational databases
- learned models