A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training.
Siddharth SinghOlatunji RuwaseAmmar Ahmad AwanSamyam RajbhandariYuxiong HeAbhinav BhatelePublished in: CoRR (2023)
Keyphrases
- subject matter experts
- domain experts
- expert advice
- human experts
- training process
- high order
- supervised learning
- domain knowledge
- training samples
- training examples
- training algorithm
- parallel processing
- dimensionality reduction
- training phase
- tensor space
- knowledge base
- test set
- generative model
- knowledge acquisition
- online learning
- higher order