Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs.

Published in: CoRR (2024)

Keyphrases