Login / Signup

High-throughput Generative Inference of Large Language Models with a Single GPU.

Ying ShengLianmin ZhengBinhang YuanZhuohan LiMax RyabininDaniel Y. FuZhiqiang XieBeidi ChenClark W. BarrettJoseph E. GonzalezPercy LiangChristopher RéIon StoicaCe Zhang
Published in: CoRR (2023)
Keyphrases