Publication: Offloaded GPU Collectives Using CORE-Direct and CUDA Capabilities on InfiniBand Clusters.