Login / Signup

Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference.

Muhammad AdnanAkhil ArunkumarGaurav JainPrashant J. NairIlya SoloveychikPurushotham Kamath
Published in: CoRR (2024)
Keyphrases
  • generative model
  • neural network
  • cost effective
  • data sets
  • computationally efficient
  • inference process
  • genetic algorithm
  • search engine
  • query processing
  • data driven
  • probabilistic inference
  • selection algorithm