On the effect of dropping layers of pre-trained transformer models.

Hassan Sajjad Fahim Dalvi Nadir Durrani Preslav Nakov

Published in: Comput. Speech Lang. (2023)

Keyphrases

pre trained
wide range
probabilistic model
object detection
text classification
parametric models