Login / Signup
Reproducible data science over data lakes: replayable data pipelines with Bauplan and Nessie.
Jacopo Tagliabue
Ciro Greco
Published in:
DEEM@SIGMOD (2024)
Keyphrases
</>
data sets
data analysis
data collection
high quality
data sources
database
data points
data processing
synthetic data
databases
prior knowledge
data distribution
statistical analysis
original data
sensor data
missing data
high dimensional data
labeled data
mutual information
knowledge management
learning algorithm