RelayAttention for Efficient Large Language Model Serving with Long System Prompts.
Lei ZhuXinjiang WangWayne ZhangRynson W. H. LauPublished in: ACL (1) (2024)
Keyphrases
- language model
- language modeling
- n gram
- document retrieval
- probabilistic model
- query expansion
- retrieval model
- smoothing methods
- language modelling
- mixture model
- speech recognition
- information retrieval
- query terms
- statistical language models
- context sensitive
- translation model
- vector space model
- language model for information retrieval
- information retrieval systems
- ad hoc information retrieval
- word clouds