Login / Signup

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training.

Wenyu DuTongxu LuoZihan QiuZeyu HuangYikang ShenReynold ChengYike GuoJie Fu
Published in: CoRR (2024)
Keyphrases