Login / Signup

Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs.

Yixuan MeiYonghao ZhuangXupeng MiaoJuncheng YangZhihao JiaRashmi Vinayak
Published in: CoRR (2024)
Keyphrases