Login / Signup
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement.
Xiaonan Nie
Xupeng Miao
Zilong Wang
Zichao Yang
Jilong Xue
Lingxiao Ma
Gang Cao
Bin Cui
Published in:
Proc. ACM Manag. Data (2023)
Keyphrases
</>
probabilistic model
viewpoint
real time
reinforcement learning
wide range
prior knowledge