Filter large-scale engine data using apache spark.
Donato PirozziVittorio ScaranoSteven BeggGuillaume de SerceyAndrew FishAndrew HarveyPublished in: INDIN (2016)
Keyphrases
- data sets
- data collection
- database
- complex data
- statistical analysis
- data analysis
- prior knowledge
- training data
- high quality
- real world
- data objects
- raw data
- data distribution
- missing data
- synthetic data
- data processing
- knowledge discovery
- data sources
- data structure
- data management
- experimental data
- association rules
- original data
- multiscale
- data streams