Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training.

Published in: CoRR (2023)

Keyphrases