Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers (Student Abstract).

Danilo DordevicVukasin BozicJoseph ThommesDaniele CoppolaSidak Pal Singh
Published in: AAAI (2024)