GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM.

Published in: CoRR (2024)

Keyphrases