Parallel Attention and Feed-Forward Net Design for Pre-training and Inference on Transformers.

Shashank Sonkar Richard G. Baraniuk

Published in: CoRR (2023)

Keyphrases

feed forward
back propagation
recurrent networks
feed forward neural networks
case study
artificial neural networks
hidden layer
neural network
design process
training examples
neural nets
activation function
computer architecture
error back propagation
recurrent neural networks
training algorithm
parallel implementation
structured prediction