A quantitative analysis of fault tolerance mechanisms for parallel machine learning systems with parameter servers.
Mingxi LiYusuke TanimuraHidemoto NakadaPublished in: IMCOM (2017)
Keyphrases
- quantitative analysis
- fault tolerance
- machine learning systems
- fault tolerant
- single point of failure
- machine learning
- qualitative analysis
- distributed systems
- machine learning algorithms
- distributed computing
- response time
- group communication
- load balancing
- peer to peer
- high availability
- qualitative and quantitative analysis
- database replication
- learning systems
- mobile agents
- qualitative evaluation
- fault management
- learning classifier systems
- replicated databases
- data center
- high performance computing
- supervised learning algorithms