• search
    search
  • reviewers
    reviewers
  • feeds
    feeds
  • assignments
    assignments
  • settings
  • logout

Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors.

Sameer DeshmukhRio YokotaGeorge Bosilca
Published in: ACM Trans. Math. Softw. (2023)
Keyphrases
  • matrix multiplication
  • memory hierarchy
  • distributed memory
  • embedded processors
  • objective function
  • parallel algorithm
  • multithreading
  • multiprocessor systems
  • dynamic programming
  • parallel processing