Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping.
Chenyu JiangYe TianZhen JiaShuai ZhengChuan WuYida WangPublished in: MLSys (2024)
Keyphrases
- graph theory
- communities in social networks
- training process
- subgraph isomorphism
- training set
- graph structure
- information sharing
- overlapping communities
- communication systems
- random walk
- training examples
- connected components
- mixture model
- subject matter experts
- graph theoretic
- graph representation
- directed acyclic graph
- data sets
- domain experts
- structured data
- test set
- communication cost
- graph matching
- directed graph
- graph model
- communication technologies
- parallel algorithm
- graph based algorithm
- training samples
- neural network