Login / Signup
Prompt Cache: Modular Attention Reuse for Low-Latency Inference.
In Gim
Guojun Chen
Seung-Seob Lee
Nikhil Sarda
Anurag Khandelwal
Lin Zhong
Published in:
CoRR (2023)
Keyphrases
</>
low latency
high bandwidth
high throughput
high speed
real time
highly efficient
massive scale
virtual machine
query processing
main memory
continuous query processing
stream processing
distributed systems
network traffic