Login / Signup

On the effect of dropping layers of pre-trained transformer models.

Hassan SajjadFahim DalviNadir DurraniPreslav Nakov
Published in: Comput. Speech Lang. (2023)
Keyphrases
  • pre trained
  • wide range
  • probabilistic model
  • object detection
  • text classification
  • parametric models