Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference.

Published in: ISCA (2024)

Keyphrases