MoEfication: Transformer Feed-forward Layers are Mixtures of Experts.

Zhengyan Zhang Yankai Lin Zhiyuan Liu Peng Li Maosong Sun Jie Zhou

Published in: ACL (Findings) (2022)

Keyphrases

feed forward
back propagation
neural nets
artificial neural networks
neural network
biologically plausible
recurrent neural networks
fuzzy logic
multi layer
neural architecture
feed forward neural networks
hidden layer
domain experts
activation function
mixture model
high voltage
error back propagation
visual cortex
recurrent networks
expectation maximization
power transformers
artificial neural
multiple layers
primate visual cortex
fault diagnosis
machine learning