MiniCache: KV Cache Compression in Depth Dimension for Large Language Models.
Akide LiuJing LiuZizheng PanYefei HeGholamreza HaffariBohan ZhuangPublished in: CoRR (2024)
Keyphrases
- language model
- language modeling
- document retrieval
- n gram
- probabilistic model
- speech recognition
- information retrieval
- retrieval model
- statistical language models
- language modelling
- query expansion
- test collection
- vector space model
- language model for information retrieval
- language models for information retrieval
- context sensitive
- smoothing methods
- query terms
- word error rate
- pseudo relevance feedback
- query processing
- relevance model
- document length
- bayesian networks
- machine learning