DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving.

Published in: CoRR (2024)

Keyphrases