Login / Signup
Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors.
Sameer Deshmukh
Rio Yokota
George Bosilca
Published in:
ACM Trans. Math. Softw. (2023)
Keyphrases
</>
matrix multiplication
memory hierarchy
distributed memory
embedded processors
objective function
parallel algorithm
multithreading
multiprocessor systems
dynamic programming
parallel processing