Large Scale Distributed Data Science using Apache Spark.
James G. ShanahanLiang DaiPublished in: KDD (2015)
Keyphrases
- distributed data
- data sharing
- distributed data mining
- data distribution
- integrating heterogeneous
- open source
- databases
- distributed data sources
- communication cost
- file system
- semantically heterogeneous
- open source software
- multi dimensional
- nearest neighbor
- decision making
- web server
- object oriented
- feature space
- pattern recognition
- face recognition
- real world
- neural network
- data sets