Login / Signup
Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference.
Muhammad Adnan
Akhil Arunkumar
Gaurav Jain
Prashant J. Nair
Ilya Soloveychik
Purushotham Kamath
Published in:
CoRR (2024)
Keyphrases
</>
generative model
neural network
cost effective
data sets
computationally efficient
inference process
genetic algorithm
search engine
query processing
data driven
probabilistic inference
selection algorithm