FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement.

Published in: Proc. ACM Manag. Data (2023)

Keyphrases