Login / Signup
On the effect of dropping layers of pre-trained transformer models.
Hassan Sajjad
Fahim Dalvi
Nadir Durrani
Preslav Nakov
Published in:
Comput. Speech Lang. (2023)
Keyphrases
</>
pre trained
wide range
probabilistic model
object detection
text classification
parametric models