C
search
search
reviewers
reviewers
feeds
feeds
assignments
assignments
settings
logout
Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors.
Sameer Deshmukh
Rio Yokota
George Bosilca
Published in:
ACM Trans. Math. Softw. (2023)
Keyphrases
</>
matrix multiplication
memory hierarchy
distributed memory
embedded processors
objective function
parallel algorithm
multithreading
multiprocessor systems
dynamic programming
parallel processing