Login / Signup
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training.
Siddharth Singh
Olatunji Ruwase
Ammar Ahmad Awan
Samyam Rajbhandari
Yuxiong He
Abhinav Bhatele
Published in:
ICS (2023)
Keyphrases
</>
data parallelism
combinatorial search
distributed computing
parallel tree search
parallel programming
machine learning
general purpose
higher order
parallel processing
distributed memory
supervised learning