CAPTURE: Memory-Centric Partitioning for Distributed DNN Training with Hybrid Parallelism.
Henk DreuningKees VerstoepHenri E. BalRob V. van NieuwpoortPublished in: HiPC (2023)
Keyphrases
- training process
- distributed systems
- computational power
- training set
- distributed environment
- lightweight
- training algorithm
- memory requirements
- parallel processing
- load balance
- test set
- peer to peer
- parallel execution
- communication cost
- distributed computing
- training phase
- memory usage
- data transfer
- neural network
- level parallelism
- secure information sharing
- dynamically created
- commodity hardware
- replica selection
- shared memory
- main memory
- online learning
- query processing
- cooperative