FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics.

Published in: MLSys (2024)

Keyphrases