Self-Distillation for Further Pre-training of Transformers.
Seanie LeeMinki KangJuho LeeSung Ju HwangKenji KawaguchiPublished in: CoRR (2022)
Keyphrases
- training phase
- artificial intelligence
- training algorithm
- online learning
- artificial neural networks
- active learning
- feed forward neural networks
- training process
- test set
- back propagation
- training examples
- supervised learning
- probabilistic model
- data sets
- training set
- multiscale
- metadata
- information systems
- computer vision
- genetic algorithm
- information retrieval
- machine learning