Sign in

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention.

Róbert CsordásPiotr PiekosKazuki IrieJürgen Schmidhuber
Published in: CoRR (2023)
Keyphrases
  • data sets
  • gaussian mixture model
  • focus of attention
  • case study
  • bayesian networks
  • multiscale
  • expectation maximization
  • mixture model
  • expert advice