Login / Signup

A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training.

Siddharth SinghOlatunji RuwaseAmmar Ahmad AwanSamyam RajbhandariYuxiong HeAbhinav Bhatele
Published in: ICS (2023)
Keyphrases
  • data parallelism
  • combinatorial search
  • distributed computing
  • parallel tree search
  • parallel programming
  • machine learning
  • general purpose
  • higher order
  • parallel processing
  • distributed memory
  • supervised learning