Filter dates
Overview
- recent advances
- neural network training
- linear algebra
- data transfer
- matrix multiplication
Publications
Tightening I/O Lower Bounds through the Hourglass Dependency Pattern.
CoRR
Tightening I/O Lower Bounds through the Hourglass Dependency Pattern.
SPAA
Rockmate: an Efficient, Fast, Automatic and Generic Tool for Re-materialization in PyTorch.
ICML
On the Arithmetic Intensity of Distributed-Memory Dense Matrix Multiplication Involving a Symmetric Input Matrix (SYMM).
IPDPS
Data Distribution Schemes for Dense Linear Algebra Factorizations on Any Number of Nodes.
IPDPS
I/O-Optimal Algorithms for Symmetric Linear Algebra Kernels.
SPAA