Login / Signup

XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference.

João MonteiroÉtienne MarcottePierre-André NoëlValentina ZantedeschiDavid VázquezNicolas ChapadosChristopher PalPerouz Taslakian
Published in: CoRR (2024)
Keyphrases
  • cost effective
  • response time
  • context aware
  • contextual information
  • search engine
  • probabilistic inference
  • markov random field
  • context sensitive
  • prefetching