Login / Signup

M\'elange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity.

Tyler GriggsXiaoxuan LiuJiaxiang YuDoyoung KimWei-Lin ChiangAlvin CheungIon Stoica
Published in: CoRR (2024)
Keyphrases