Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions.
Qingda LuXiaoyang GaoSriram KrishnamoorthyGerald BaumgartnerJ. RamanujamP. SadayappanPublished in: J. Parallel Distributed Comput. (2012)