Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective.

Published in: CoRR (2024)

Keyphrases