LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training.
Tong ZhuXiaoye QuDaize DongJiacheng RuanJingqi TongConghui HeYu ChengPublished in: CoRR (2024)
Keyphrases
- learning perl
- object oriented programming
- training phase
- training set
- training algorithm
- training examples
- artificial neural networks
- mixture model
- hidden markov models
- domain knowledge
- supervised learning
- language model
- training process
- training samples
- computer systems
- databases
- programming language
- e learning
- feature selection
- neural network