Login / Signup
A Queueing Theoretic Perspective on Low-Latency LLM Inference with Variable Token Length.
Yuqing Yang
Yuedong Xu
Lei Jiao
Published in:
CoRR (2024)
Keyphrases
</>
data analysis
low latency
high speed
high bandwidth
high throughput
highly efficient
virtual machine
real time
long running
massive scale
databases
relational databases
constraint satisfaction