Login / Signup
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training.
Wenyu Du
Tongxu Luo
Zihan Qiu
Zeyu Huang
Yikang Shen
Reynold Cheng
Yike Guo
Jie Fu
Published in:
CoRR (2024)
Keyphrases
</>
probabilistic model
computational model
high level
prior knowledge
statistical model
cost function
experimental data
probability distribution
multi class
em algorithm
theoretical analysis
simulation model
formal model