Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization.
Jungi LeeWonbeom LeeJaewoong SimPublished in: ISCA (2024)
Keyphrases
- language model
- tensor decomposition
- data representation
- auxiliary information
- language modeling
- high order
- n gram
- information retrieval
- document retrieval
- retrieval model
- query expansion
- probabilistic model
- tensor factorization
- low rank
- test collection
- visual data
- translation model
- query terms
- image retrieval
- pairwise
- video sequences
- training data