Large Scale Distributed Data Science from scratch using Apache Spark 2.0.
James ShanahanLiang DaiPublished in: WWW (Companion Volume) (2017)
Keyphrases
- distributed data
- data sharing
- distributed data mining
- open source
- integrating heterogeneous
- data distribution
- distributed data sources
- communication cost
- neural network
- data mining algorithms
- databases
- information retrieval
- information sharing
- web server
- distributed databases
- data access
- file system
- nearest neighbor
- knn
- data mining
- real world