Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints.
Aran KomatsuzakiJoan PuigcerverJames Lee-ThorpCarlos Riquelme RuizBasil MustafaJoshua AinslieYi TayMostafa DehghaniNeil HoulsbyPublished in: ICLR (2023)
Keyphrases
- avoid overfitting
- dense sampling
- training set
- high dimensional
- training process
- domain experts
- motion field estimation
- dense optical flow
- compressed sensing
- sparse data
- training phase
- test set
- artificial neural networks
- e learning
- training examples
- mixture model
- supervised learning
- expert finding
- sparse reconstruction
- support vector
- decision trees
- dense stereo
- data sets