Login / Signup

KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization.

Tianyi ZhangJonah YiZhaozhuo XuAnshumali Shrivastava
Published in: CoRR (2024)
Keyphrases