Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention.

Published in: USENIX ATC (2024)

Keyphrases