Login / Signup
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints.
Aran Komatsuzaki
Joan Puigcerver
James Lee-Thorp
Carlos Riquelme Ruiz
Basil Mustafa
Joshua Ainslie
Yi Tay
Mostafa Dehghani
Neil Houlsby
Published in:
CoRR (2022)
Keyphrases
</>
high dimensional
training process
motion field estimation
sparse data
expectation maximization
training examples
dense stereo
dense optical flow
gaussian distribution
test set
data sets
mixture model
training set
neural network
domain experts
domain knowledge
optical flow
artificial neural networks