Login / Signup
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation.
Minsik Cho
Mohammad Rastegari
Devang Naik
Published in:
CoRR (2024)
Keyphrases
</>
causal inference
bayesian networks
parallel processing
probabilistic inference
inference process
causal relations
causal theories
causal reasoning
query processing
transmission line
highly scalable
generation process
multithreading
causal networks
prefetching
causal independence
cache misses