LiveMind: Low-latency Large Language Models with Simultaneous Inference.

Chuangtao Chen Grace Li Zhang Xunzhao Yin Cheng Zhuo Ulf Schlichtmann Bing Li

Published in: CoRR (2024)

Keyphrases

language model
low latency
language modeling
high speed
n gram
language modelling
high throughput
document retrieval
probabilistic model
speech recognition
real time
retrieval model
virtual machine
highly efficient
test collection
smoothing methods
statistical language models
information retrieval
query expansion
stream processing
query terms
bayesian networks
language models for information retrieval
low cost