Login / Signup

Optimized large-message broadcast for deep learning workloads: MPI, MPI+NCCL, or NCCL2?

Ammar Ahmad AwanKarthik Vadambacheri ManianChing-Hsiang ChuHari SubramoniDhabaleswar K. Panda
Published in: Parallel Comput. (2019)
Keyphrases