Stabilizing Transformer Training by Preventing Attention Entropy Collapse.
Shuangfei ZhaiTatiana LikhomanenkoEtai LittwinDan BusbridgeJason RamapuramYizhe ZhangJiatao GuJoshua M. SusskindPublished in: CoRR (2023)
Keyphrases
- training set
- database
- training phase
- fuzzy logic
- training process
- supervised learning
- information theory
- training samples
- databases
- training algorithm
- information theoretic
- power system
- training examples
- online learning
- mutual information
- test set
- image registration
- case study
- real world
- nonlinear systems
- information entropy
- fuzzy entropy
- partial discharge