Login / Signup
Q-Hitter: A Better Token Oracle for Efficient LLM Inference via Sparse-Quantized KV Cache.
Zhenyu Zhang
Shiwei Liu
Runjin Chen
Bhavya Kailkhura
Beidi Chen
Atlas Wang
Published in:
MLSys (2024)
Keyphrases
</>
high dimensional
efficient learning
database
main memory
bayesian inference
highly efficient
probabilistic model
probabilistic inference
back end