Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers.
Yi TayMostafa DehghaniJinfeng RaoWilliam FedusSamira AbnarHyung Won ChungSharan NarangDani YogatamaAshish VaswaniDonald MetzlerPublished in: CoRR (2021)
Keyphrases
- fine tuning
- fine tuned
- viable alternative
- fine tune
- scale space
- test set
- search algorithm
- database
- small scale
- training process
- training phase
- real world
- decision trees
- support vector
- active learning
- supervised learning
- clustering algorithm
- scale invariant
- information systems
- computer software
- machine learning
- databases