On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models.

Sean Farhat Deming Chen

Published in: CoRR (2024)

Keyphrases

probabilistic model
image sequences
prior knowledge
data sets
neural network
small number
online learning
parameter estimation
training examples
machine learning algorithms
statistical model
statistical models
modeling framework