Login / Signup

MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts.

Zhenpeng SuZijia LinXue BaiXing WuYizhe XiongHaoran LianGuangyuan MaHui ChenGuiguang DingWei ZhouSonglin Hu
Published in: CoRR (2024)
Keyphrases