Login / Signup
How to set AdamW's weight decay as you scale model and dataset size.
Xi Wang
Laurence Aitchison
Published in:
CoRR (2024)
Keyphrases
</>
formal model
probabilistic model
small number
theoretical analysis
mathematical model
statistical model
evolutionary algorithm
theoretical framework
high level
lower bound
data model
cost function
hidden markov models
feature set
hierarchical structure