Login / Signup
M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining.
Junyang Lin
An Yang
Jinze Bai
Chang Zhou
Le Jiang
Xianyan Jia
Ang Wang
Jie Zhang
Yong Li
Wei Lin
Jingren Zhou
Hongxia Yang
Published in:
CoRR (2021)
Keyphrases
</>
lower bound
computationally efficient
database
data sets
real world
decision trees
information technology
association rules
multiresolution
mobile robot
computationally expensive
highly efficient