Login / Signup
DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving.
Foteini Strati
Sara McAllister
Amar Phanishayee
Jakub Tarnawski
Ana Klimovic
Published in:
CoRR (2024)
Keyphrases
</>
fault tolerant
fault tolerance
data streams
generative model
distributed systems
transmission line
prefetching
high availability
video streaming
load balancing
data access
query processing
safety critical
main memory
state machine
fault isolation
multi agent
response time
interconnection networks
streaming media