CaraServe: CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference.

Suyi LiHanfeng LuTianyuan WuMinchen YuQizhen WengXusheng ChenYizhou ShanBinhang YuanWei Wang
Published in: CoRR (2024)
Keyphrases