Extremely Small BERT Models from Mixed-Vocabulary Training.

Sanqiang Zhao Raghav Gupta Yang Song Denny Zhou

Published in: EACL (2021)

Keyphrases

probabilistic model
neural network
real world
training set
prior knowledge
experimental data
metadata
pairwise
small number
model selection
parameter estimation
process model
labeled data for training