Parallel Attention and Feed-Forward Net Design for Pre-training and Inference on Transformers.
Shashank SonkarRichard G. BaraniukPublished in: CoRR (2023)
Keyphrases
- feed forward
- back propagation
- recurrent networks
- feed forward neural networks
- case study
- artificial neural networks
- hidden layer
- neural network
- design process
- training examples
- neural nets
- activation function
- computer architecture
- error back propagation
- recurrent neural networks
- training algorithm
- parallel implementation
- structured prediction