Zero-Cost, Arrow-Enabled Data Interface for Apache Spark.
Sebastiaan Alvarez RodriguezJayjeet ChakrabortyAaron ChuIvo JimenezJeff LeFevreCarlos MaltzahnAlexandru UtaPublished in: CoRR (2021)
Keyphrases
- data sets
- data sources
- data processing
- database
- original data
- raw data
- complex data
- missing data
- data collection
- input data
- storage space
- open source
- data structure
- small number
- experimental data
- data quality
- training data
- knowledge discovery
- data points
- synthetic data
- end users
- data streams
- noisy data
- neural network
- cost benefit analysis