Sign in

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models.

Damai DaiChengqi DengChenggang ZhaoR. X. XuHuazuo GaoDeli ChenJiashi LiWangding ZengXingkai YuY. WuZhenda XieY. K. LiPanpan HuangFuli LuoChong RuanZhifang SuiWenfeng Liang
Published in: CoRR (2024)
Keyphrases