SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models.
Haojie DuanmuZhihang YuanXiuhong LiJiangfei DuanXingcheng ZhangDahua LinPublished in: CoRR (2024)
Keyphrases
- sliding window
- language model
- language modeling
- data streams
- n gram
- speech recognition
- window size
- probabilistic model
- language modelling
- retrieval model
- fixed size
- statistical language models
- document retrieval
- test collection
- information retrieval
- stream data
- streaming data
- query expansion
- pseudo relevance feedback
- ad hoc information retrieval
- context sensitive
- limited memory
- document ranking
- smoothing methods
- relevance model
- variable size
- query terms
- language models for information retrieval
- continuous queries
- vector space model