LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training.

Tong Zhu Xiaoye Qu Daize Dong Jiacheng Ruan Jingqi Tong Conghui He Yu Cheng

Published in: CoRR (2024)

Keyphrases

learning perl
object oriented programming
training phase
training set
training algorithm
training examples
artificial neural networks
mixture model
hidden markov models
domain knowledge
supervised learning
language model
training process
training samples
computer systems
databases
programming language
e learning
feature selection
neural network