Login / Signup
Should I try multiple optimizers when fine-tuning a pre-trained Transformer for NLP tasks? Should I tune their hyperparameters?
Nefeli Gkouti
Prodromos Malakasiotis
Stavros Toumpis
Ion Androutsopoulos
Published in:
EACL (1) (2024)
Keyphrases
</>
fine tuning
hyperparameters
pre trained
bayesian inference
cross validation
model selection
training data
bayesian framework
prior information
closed form
non stationary
neural network
small number
state space
video sequences
image sequences
data mining