GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM.
Hao KangQingru ZhangSouvik KunduGeonhwa JeongZaoxing LiuTushar KrishnaTuo ZhaoPublished in: CoRR (2024)
Keyphrases
- compression scheme
- compression algorithm
- compression ratio
- lossless compression
- image compression
- data compression
- arithmetic coding
- query processing
- belief networks
- image coding
- probabilistic inference
- generative model
- block size
- wavelet transform
- bitstream
- main memory
- bayesian networks
- bayesian inference
- fault diagnosis
- inference process
- entropy coding
- context modeling
- lossless image compression
- jpeg ls