MoEfication: Transformer Feed-forward Layers are Mixtures of Experts.
Zhengyan ZhangYankai LinZhiyuan LiuPeng LiMaosong SunJie ZhouPublished in: ACL (Findings) (2022)
Keyphrases
- feed forward
- back propagation
- neural nets
- artificial neural networks
- neural network
- biologically plausible
- recurrent neural networks
- fuzzy logic
- multi layer
- neural architecture
- feed forward neural networks
- hidden layer
- domain experts
- activation function
- mixture model
- high voltage
- error back propagation
- visual cortex
- recurrent networks
- expectation maximization
- power transformers
- artificial neural
- multiple layers
- primate visual cortex
- fault diagnosis
- machine learning