On Scalability of Distributed Machine Learning with Big Data on Apache Spark.
Ameen Abdel HaiBabak ForouraghiPublished in: BigData Congress (2018)
Keyphrases
- big data
- map reduce
- cloud computing
- machine learning
- data intensive
- data intensive computing
- data science
- data analysis
- commodity hardware
- data management
- data processing
- knowledge discovery
- big data analytics
- social media
- distributed systems
- open source
- high volume
- fault tolerant
- fault tolerance
- massive data
- vast amounts of data
- business intelligence
- unstructured data
- social computing
- parallel computation
- predictive modeling
- real world
- massive datasets
- distributed computing
- data mining
- information processing
- text mining
- statistical and machine learning
- distributed environment
- natural language processing
- object oriented
- database