SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention.
Qianchao ZhuJiangfei DuanChang ChenSiran LiuXiuhong LiGuanyu FengXin LvHuanqi CaoXiao ChuanfuXingcheng ZhangDahua LinChao YangPublished in: CoRR (2024)