Understanding Counting in Small Transformers: The Interplay between Attention and Feed-Forward Layers.

Freya Behrens Luca Biggio Lenka Zdeborová

Published in: CoRR (2024)

Keyphrases

feed forward
back propagation
neural nets
artificial neural networks
neural network
recurrent neural networks
hidden layer
biologically plausible
feed forward neural networks
visual cortex
recurrent networks
activation function
knowledge base
single layer
training algorithm
multi layer
neural architecture
spiking neural networks
multiple layers
input image
spiking neurons
artificial neural