Login / Signup
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization.
Coleman Hooper
Sehoon Kim
Hiva Mohammadzadeh
Michael W. Mahoney
Yakun Sophia Shao
Kurt Keutzer
Amir Gholami
Published in:
CoRR (2024)
Keyphrases
</>
contextual information
context sensitive
bayesian networks
context aware
total length
database
prefetching
data access
query processing
real world
database management systems
information extraction
high quality
context awareness
access patterns
quantization error
neural network