Preemptive All-reduce Scheduling for Expediting Distributed DNN Training.
Yixin BaoYanghua PengYangrui ChenChuan WuPublished in: INFOCOM (2020)
Keyphrases
- scheduling algorithm
- scheduling problem
- training process
- meeting scheduling
- distributed systems
- lower bound
- multi agent
- computational grids
- geographically distributed
- training set
- distributed environment
- dynamic scheduling
- grid environment
- learning algorithm
- fault tolerant
- real time database systems
- response time
- training algorithm
- training phase
- supervised learning
- resource constraints
- distributed data
- test set
- peer to peer
- single machine
- resource allocation
- mobile agents