Login / Signup
Prompt Cache: Modular Attention Reuse for Low-Latency Inference.
In Gim
Guojun Chen
Seung-Seob Lee
Nikhil Sarda
Anurag Khandelwal
Lin Zhong
Published in:
MLSys (2024)
Keyphrases
</>
low latency
high bandwidth
high speed
high throughput
virtual machine
massive scale
highly efficient
real time
continuous query processing
query processing
data sets
stream processing
data access
low complexity
main memory
data collection
data management