Symmetric Dot-Product Attention for Efficient Training of BERT Language Models.
Martin CourtoisMalte OstendorffLeonhard HennigGeorg RehmPublished in: CoRR (2024)
Keyphrases
- language model
- language modeling
- dot product
- n gram
- document retrieval
- probabilistic model
- language modelling
- speech recognition
- information retrieval
- mixture model
- test collection
- query expansion
- statistical language models
- positive semi definite
- machine learning
- language models for information retrieval
- retrieval model
- euclidean space