Login / Signup

Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers.

Chao LouZixia JiaZilong ZhengKewei Tu
Published in: CoRR (2024)
Keyphrases
  • long range
  • short range
  • long range correlations
  • neural network
  • theoretical guarantees
  • data mining
  • memory efficient