Multi-Head Attention with Disagreement Regularization.
Jian LiZhaopeng TuBaosong YangMichael R. LyuTong ZhangPublished in: EMNLP (2018)
Keyphrases
- real time
- website
- active learning
- neural network
- empirical risk minimization
- focus of attention
- regularization parameter
- special case
- multiscale
- visual attention
- prior information
- information systems
- search engine
- reproducing kernel hilbert space
- regularization framework
- regularization method
- regularization methods
- information retrieval