Login / Signup
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention.
Róbert Csordás
Piotr Piekos
Kazuki Irie
Jürgen Schmidhuber
Published in:
CoRR (2023)
Keyphrases
</>
data sets
gaussian mixture model
focus of attention
case study
bayesian networks
multiscale
expectation maximization
mixture model
expert advice