Login / Signup
Intelligent Learning Rate Distribution to reduce Catastrophic Forgetting in Transformers.
Philip Kenneweg
Alexander Schulz
Sarah Schröder
Barbara Hammer
Published in:
CoRR (2024)
Keyphrases
</>
learning rate
convergence rate
learning algorithm
error function
adaptive learning rate
hidden layer
uniform convergence
rapid convergence
multilayer neural networks
convergence speed
activation function
weight vector
convergence theorem
search space
multi class
natural gradient