Login / Signup
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference.
Jiaming Tang
Yilong Zhao
Kan Zhu
Guangxuan Xiao
Baris Kasikci
Song Han
Published in:
CoRR (2024)
Keyphrases
</>
query processing
data retrieval
data structure
response time
contextual information
relevance feedback
database
range queries
high dimensional
query interface
efficient learning
query execution
retrieval method
bayesian inference
search queries
vector space
user queries
query expansion