Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors.

Published in: ACM Trans. Math. Softw. (2023)

Keyphrases