Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns.
Brian DuSellDavid ChiangPublished in: CoRR (2023)
Keyphrases
- computational model
- high level
- experimental data
- mathematical model
- formal model
- probabilistic model
- probability distribution
- management system
- hierarchical model
- real time
- conceptual model
- hierarchical structure
- statistical model
- input data
- cost function
- artificial neural networks
- data structure
- objective function
- case study
- learning algorithm
- data sets