Login / Signup

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention.

Huiqiang JiangYucheng LiChengruidong ZhangQianhui WuXufang LuoSurin AhnZhenhua HanAmir H. AbdiDongsheng LiChin-Yew LinYuqing YangLili Qiu
Published in: CoRR (2024)
Keyphrases