Login / Signup

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention.

Qianchao ZhuJiangfei DuanChang ChenSiran LiuXiuhong LiGuanyu FengXin LvHuanqi CaoXiao ChuanfuXingcheng ZhangDahua LinChao Yang
Published in: CoRR (2024)
Keyphrases