LakeBench: Benchmarks for Data Discovery over Data Lakes.
Kavitha SrinivasJulian DolbyIbrahim AbdelazizOktie HassanzadehHarsha KokelAamod KhatiwadaTejaswini PedapatiSubhajit ChaudhuryHorst SamulowitzPublished in: CoRR (2023)
Keyphrases
- data sets
- raw data
- data sources
- knowledge discovery
- experimental data
- data collection
- training data
- missing data
- high quality
- image data
- synthetic data
- historical data
- database
- noisy data
- original data
- input data
- data processing
- statistical analysis
- data points
- attribute values
- relational databases
- data distribution
- statistical methods
- data analysis
- database systems
- case study
- complex data
- feature selection