Login / Signup
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers.
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
Published in:
CoRR (2024)
Keyphrases
</>
long range
short range
long range correlations
neural network
theoretical guarantees
data mining
memory efficient