TA-MoE: Topology-Aware Large Scale Mixture-of-Expert Training.

Chang Chen Min Li Zhihua Wu Dianhai Yu Chao Yang

Published in: NeurIPS (2022)

Keyphrases

small scale
real world
topology preservation
training algorithm
training process
human experts
training examples
supervised learning
training set
case study
probability distribution
real life
feature selection
training phase
web scale
expert advice
neural network