Login / Signup

On the Expressivity Role of LayerNorm in Transformers' Attention.

Shaked BrodyUri AlonEran Yahav
Published in: CoRR (2023)
Keyphrases
  • special case
  • real time
  • neural network
  • computer vision
  • similarity measure
  • bayesian networks
  • multiscale
  • digital libraries
  • visual attention