A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale.

Published in: CoRR (2023)

Keyphrases