BOA: batch orchestration algorithm for straggler mitigation of distributed DL training in heterogeneous GPU cluster.
Eunju YangDong-Ki KangChan-Hyun YounPublished in: J. Supercomput. (2020)
Keyphrases
- computational complexity
- preprocessing
- learning algorithm
- optimal solution
- multi agent
- detection algorithm
- objective function
- times faster
- k means
- worst case
- parallel implementation
- clustering method
- optimization algorithm
- dynamic programming
- cost function
- search space
- expectation maximization
- hierarchical clustering
- training process
- search algorithm
- training phase
- parallel computation