Login / Signup
Should I try multiple optimizers when fine-tuning pre-trained Transformers for NLP tasks? Should I tune their hyperparameters?
Nefeli Gkouti
Prodromos Malakasiotis
Stavros Toumpis
Ion Androutsopoulos
Published in:
CoRR (2024)
Keyphrases
</>
fine tuning
hyperparameters
pre trained
cross validation
model selection
random sampling
computer vision
closed form
prior information
sample size
feature extraction
edge detection
missing data
bayesian framework
bayesian inference
small number
training data