Login / Signup
Depth Dependence of μP Learning Rates in ReLU MLPs.
Samy Jelassi
Boris Hanin
Ziwei Ji
Sashank J. Reddi
Srinadh Bhojanapalli
Sanjiv Kumar
Published in:
CoRR (2023)
Keyphrases
</>
learning rate
activation function
learning algorithm
convergence rate
gaussian kernels
hidden layer
multilayer perceptron
convergence speed
uniform convergence
covering numbers
multi layer perceptron
image compression
generalization ability
convergence theorem