Understanding and improving disk-based intermediate data caching in Spark.
Kaihui ZhangYusuke TanimuraHidemoto NakadaHirotaka OgawaPublished in: IEEE BigData (2017)
Keyphrases
- data sets
- data collection
- data structure
- raw data
- data analysis
- data points
- knowledge discovery
- sensor data
- statistical analysis
- high quality
- complex data
- noisy data
- small number
- data processing
- information retrieval
- data distribution
- synthetic data
- end users
- data mining techniques
- experimental data
- image data
- original data
- data objects
- prior knowledge
- feature space