Login / Signup

FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines.

Jiaao HeJidong Zhai
Published in: CoRR (2024)
Keyphrases