A container-based workflow for distributed training of deep learning algorithms in HPC clusters.
Jose González-AbadÁlvaro López GarcíaValentin Y. KozlovPublished in: Clust. Comput. (2023)
Keyphrases
- deep architectures
- learning algorithm
- training examples
- distributed systems
- deep learning
- supervised learning
- loosely coupled
- unsupervised learning
- clustering algorithm
- high performance computing
- cluster analysis
- multilayer neural networks
- machine learning
- distributed environment
- machine learning algorithms
- training samples
- training algorithm
- training process
- learning machines
- fault tolerance
- training set
- active learning
- peer to peer
- fuzzy clustering
- fault tolerant
- workflow execution
- reinforcement learning
- computing infrastructure
- training and test data
- back propagation
- batch mode
- workflow systems
- semi supervised
- energy efficiency
- computing environments
- hierarchical clustering
- document clustering