Sign in

Emergent Mixture-of-Experts: Can Dense Pre-trained Transformers Benefit from Emergent Modular Structures?

Zihan QiuZeyu HuangJie Fu
Published in: CoRR (2023)
Keyphrases
  • pre trained
  • viewpoint
  • data sets
  • machine learning
  • wide range