Login / Signup
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference.
João Monteiro
Étienne Marcotte
Pierre-André Noël
Valentina Zantedeschi
David Vázquez
Nicolas Chapados
Christopher Pal
Perouz Taslakian
Published in:
CoRR (2024)
Keyphrases
</>
cost effective
response time
context aware
contextual information
search engine
probabilistic inference
markov random field
context sensitive
prefetching