Login / Signup
Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors.
Sameer Deshmukh
Rio Yokota
George Bosilca
Published in:
CoRR (2023)
Keyphrases
</>
matrix multiplication
memory hierarchy
distributed memory
parallel algorithm
embedded processors
multithreading
parallel processing
computer architecture
computer vision
lower bound
higher order
computer systems
message passing
multi core processors