Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning.
Grace FanJin WangYuliang LiDan ZhangRenée J. MillerPublished in: CoRR (2022)
Keyphrases
- data sets
- original data
- experimental data
- high quality
- knowledge discovery
- prior knowledge
- supervised learning
- synthetic data
- data collection
- database
- training data
- learning process
- active learning
- training dataset
- databases
- raw data
- input data
- data processing
- learning algorithm
- automatically discovering
- satellite images
- machine learning
- background knowledge
- missing data
- online learning
- feature set
- reinforcement learning
- data structure
- probability distribution
- data analysis