Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference.

Published in: CoRR (2024)

Keyphrases