Scalable Distributed Training of Recommendation Models: An ASTRA-SIM + NS3 case-study with TCP/IP transport.
Saeed RashidiPallavi ShurpaliSrinivas SridharanNaader HassaniDheevatsa MudigereKrishnakumar NairMisha SmelyanskiTushar KrishnaPublished in: Hot Interconnects (2020)