FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design.
Haojun XiaZhen ZhengXiaoxia WuShiyang ChenZhewei YaoStephen YounArash BakhtiariMichael WyattDonglin ZhuangZhongzhu ZhouOlatunji RuwaseYuxiong HeShuaiwen Leon SongPublished in: CoRR (2024)