Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference.

Published in: CoRR (2024)

Keyphrases