Login / Signup

Extremely Small BERT Models from Mixed-Vocabulary Training.

Sanqiang ZhaoRaghav GuptaYang SongDenny Zhou
Published in: EACL (2021)
Keyphrases
  • probabilistic model
  • neural network
  • real world
  • training set
  • prior knowledge
  • experimental data
  • metadata
  • pairwise
  • small number
  • model selection
  • parameter estimation
  • process model
  • labeled data for training