LIBSHALOM: optimizing small and irregular-shaped matrix multiplications on ARMv8 multi-cores.

Weiling YangJianbin FangDezun DongXing SuZheng Wang
Published in: SC (2021)