InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management.

Wonbeom Lee Jungi Lee Junghwan Seo Jaewoong Sim

Published in: OSDI (2024)

Keyphrases

language model
language modeling
cache management
document retrieval
speech recognition
information retrieval
n gram
statistical language models
language modelling
query expansion
test collection
probabilistic model
retrieval model
context sensitive
relevance model
document ranking
generative model
machine learning
power consumption
distributed object