Login / Signup
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models.
Bowen Pan
Yikang Shen
Haokun Liu
Mayank Mishra
Gaoyuan Zhang
Aude Oliva
Colin Raffel
Rameswar Panda
Published in:
CoRR (2024)
Keyphrases
</>
language model
mixture model
training set
probabilistic model
web search
n gram
context sensitive
information retrieval
bayesian networks
knn
generative model
document retrieval
language modeling
language models for information retrieval