Linear attention is (maybe) all you need (to understand Transformer optimization).
Kwangjun AhnXiang ChengMinhak SongChulhee YunAli JadbabaieSuvrit SraPublished in: ICLR (2024)
Keyphrases
- fuzzy logic
- optimization algorithm
- global optimization
- quadratic programming
- visual attention
- real time
- optimization model
- optimization process
- closed form
- fault diagnosis
- power system
- optimization method
- optimization problems
- cost function
- website
- linear model
- constrained optimization
- information systems
- optimal design
- database
- discrete optimization
- distribution network
- power transformers
- conjugate gradient method