Login / Signup

A Queueing Theoretic Perspective on Low-Latency LLM Inference with Variable Token Length.

Yuqing YangYuedong XuLei Jiao
Published in: CoRR (2024)
Keyphrases
  • data analysis
  • low latency
  • high speed
  • high bandwidth
  • high throughput
  • highly efficient
  • virtual machine
  • real time
  • long running
  • massive scale
  • databases
  • relational databases
  • constraint satisfaction