Adam Accumulation to Reduce Memory Footprints of Both Activations and Gradients for Large-Scale DNN Training.

Published in: ECAI (2023)

Keyphrases