Login / Signup

CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving.

Yuhan LiuHanchen LiYihua ChengSiddhant RayYuyang HuangQizheng ZhangKuntai DuJiayi YaoShan LuGanesh AnanthanarayananMichael MaireHenry HoffmannAri HoltzmanJunchen Jiang
Published in: SIGCOMM (2024)
Keyphrases