Flash-LLM: Enabling Low-Cost and Highly-Efficient Large Generative Model Inference With Unstructured Sparsity.

Published in: Proc. VLDB Endow. (2023)

Keyphrases