Sign in

FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement.

Xiaonan NieXupeng MiaoZilong WangZichao YangJilong XueLingxiao MaGang CaoBin Cui
Published in: Proc. ACM Manag. Data (2023)
Keyphrases
  • probabilistic model
  • viewpoint
  • real time
  • reinforcement learning
  • wide range
  • prior knowledge