A Queueing Theoretic Perspective on Low-Latency LLM Inference with Variable Token Length.

Yuqing Yang Yuedong Xu Lei Jiao

Published in: CoRR (2024)

Keyphrases

data analysis
low latency
high speed
high bandwidth
high throughput
highly efficient
virtual machine
real time
long running
massive scale
databases
relational databases
constraint satisfaction