Mixhead: Breaking the low-rank bottleneck in multi-head attention language models.
Zhong ZhangNian ShaoChongming GaoRui MiaoQinli YangJunming ShaoPublished in: Knowl. Based Syst. (2022)
Keyphrases
- language model
- low rank
- language modeling
- linear combination
- convex optimization
- matrix factorization
- low rank matrix
- matrix completion
- missing data
- probabilistic model
- singular value decomposition
- n gram
- document retrieval
- rank minimization
- high order
- information retrieval
- retrieval model
- test collection
- high dimensional data
- semi supervised
- smoothing methods
- language models for information retrieval
- query expansion
- vector space model
- trace norm
- query terms
- relevance model
- machine learning
- data analysis