Sign in

Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention.

Tong YuRuslan KhalitovLei ChengZhirong Yang
Published in: CoRR (2022)
Keyphrases
  • dot product
  • positive semi definite
  • similarity function
  • kernel function
  • scalar product
  • feature space
  • sparse representation
  • gaussian kernels
  • image processing
  • high dimensional