No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization.
June Yong YangByeongwook KimJeongin BaeBeomseok KwonGunho ParkEunho YangSe Jung KwonDongsoo LeePublished in: CoRR (2024)
Keyphrases
- lossy image compression
- compression scheme
- quantization noise
- data compression
- efficient compression
- transmission line
- quantization scheme
- high precision
- uniform quantization
- huffman coding
- transform coding
- image compression
- compression algorithm
- compression ratio
- wavelet image coding
- prefetching
- precision and recall
- hit rate
- cache misses
- query processing
- block size
- main memory
- quantization step
- data structure