Login / Signup
Emergent Mixture-of-Experts: Can Dense Pre-trained Transformers Benefit from Emergent Modular Structures?
Zihan Qiu
Zeyu Huang
Jie Fu
Published in:
CoRR (2023)
Keyphrases
</>
pre trained
viewpoint
data sets
machine learning
wide range