MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention.
Huiqiang JiangYucheng LiChengruidong ZhangQianhui WuXufang LuoSurin AhnZhenhua HanAmir H. AbdiDongsheng LiChin-Yew LinYuqing YangLili QiuPublished in: CoRR (2024)