Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost.

Published in: NeurIPS (2022)

Keyphrases