Login / Signup
Zhuomin He
ORCID
Publication Activity (10 Years)
Years Active: 2024-2024
Publications (10 Years): 3
Top Topics
Query Expansion
Ad Hoc Information Retrieval
Smoothing Methods
N Gram
Top Venues
USENIX ATC
CoRR
ICPP
</>
Publications
</>
Bin Gao
,
Zhuomin He
,
Puru Sharma
,
Qingxuan Kang
,
Djordje Jevdjic
,
Junbo Deng
,
Xingkun Yang
,
Zhou Yu
,
Pengfei Zuo
Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention.
USENIX ATC
(2024)
Bin Gao
,
Zhehui Wang
,
Zhuomin He
,
Tao Luo
,
Weng-Fai Wong
,
Zhi Zhou
IMI: In-memory Multi-job Inference Acceleration for Large Language Models.
ICPP
(2024)
Bin Gao
,
Zhuomin He
,
Puru Sharma
,
Qingxuan Kang
,
Djordje Jevdjic
,
Junbo Deng
,
Xingkun Yang
,
Zhou Yu
,
Pengfei Zuo
AttentionStore: Cost-effective Attention Reuse across Multi-turn Conversations in Large Language Model Serving.
CoRR
(2024)